'VLM' 태그의 글 목록

Notice

모바일 환경에서 수식이 깨지는 현상이 발생합니⋯

Recent Posts

Recent Comments

Link

Github
Gmail

250x250

« 2026/05 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

목록VLM (6)

Attention please

[논문 리뷰] VERA: Explainable Video Anomaly Detection via Verbalized Learning ofVision-Language Models(2025)

이번에 리뷰할 논문은 VERA: Explainable Video Anomaly Detection via Verbalized Learning ofVision-Language Models 입니다.https://arxiv.org/abs/2412.01095 VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language ModelsThe rapid advancement of vision-language models (VLMs) has established a new paradigm in video anomaly detection (VAD): leveraging VLMs to simultaneously detect anomal..

논문 리뷰/Anomaly Detection 2025. 9. 20. 19:38

[논문 리뷰] TIPS: TEXT-IMAGE PRETRAINING WITH SPATIALAWARENESS(2025)

이번에 리뷰할 논문은 TIPS: TEXT-IMAGE PRETRAINING WITH SPATIAL AWARENESS 입니다. https://arxiv.org/abs/2410.16512 TIPS: Text-Image Pretraining with Spatial awarenessWhile image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pre..

논문 리뷰/Vision-Language model 2025. 8. 13. 17:31

[논문 리뷰] SILC: Improving Vision Language Pretraining with Self-Distillation(2023)

이번에 리뷰할 논문은 SILC: Improving Vision Language Pretraining with Self-Distillation 입니다.https://arxiv.org/abs/2310.13355 SILC: Improving Vision Language Pretraining with Self-DistillationImage-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also..

논문 리뷰/Vision-Language model 2025. 7. 31. 16:28

[논문 리뷰] LocCa: Visual Pretraining with Location-aware Captioners(2024)

이번에 리뷰할 논문은 LocCa: Visual Pretraining with Location-aware Captioners 입니다.https://arxiv.org/abs/2403.19596 LocCa: Visual Pretraining with Location-aware CaptionersImage captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper..

논문 리뷰/Vision-Language model 2025. 7. 30. 15:18

[논문 리뷰] CoOp: Learning to Prompt for Vision-Language Models(2022)

이번에 리뷰할 논문은 Learning to Prompt for Vision-Language Models 입니다.https://arxiv.org/abs/2109.01134 Learning to Prompt for Vision-Language ModelsLarge pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretiar..

논문 리뷰/Vision-Language model 2025. 7. 28. 15:27

[논문 리뷰] CoCa: Contrastive Captioners are Image-TextFoundation Models(2022)

이번에 리뷰할 논문은 CoCa: Contrastive Captioners are Image-Text Foundation Models 입니다. https://arxiv.org/abs/2205.01917 CoCa: Contrastive Captioners are Image-Text Foundation ModelsExploring large-scale pretrained foundation models is of significant interest in computer vision because these models can be quickly transferred to many downstream tasks. This paper presents Contrastive Captioner (CoCa), a mi..

논문 리뷰/Vision-Language model 2025. 7. 24. 14:26

이전 Prev 1 Next 다음

목록VLM (6)

Attention please

티스토리툴바