| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
| 31 |
- Python
- 강화학습
- transformer
- 프로그래머스
- cnn
- ViT
- Segmentation
- Learning
- pytorch
- 논문리뷰
- 논문구현
- object detection
- Ai
- llm
- Computer Vision
- opencv
- 알고리즘
- 딥러닝
- 파이토치
- 코딩테스트
- programmers
- 코드구현
- 옵티마이저
- Vision
- 파이썬
- reinforcement
- 머신러닝
- VLM
- optimizer
- 인공지능
- Today
- Total
목록VLM (6)
Attention please
이번에 리뷰할 논문은 VERA: Explainable Video Anomaly Detection via Verbalized Learning ofVision-Language Models 입니다.https://arxiv.org/abs/2412.01095 VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language ModelsThe rapid advancement of vision-language models (VLMs) has established a new paradigm in video anomaly detection (VAD): leveraging VLMs to simultaneously detect anomal..
이번에 리뷰할 논문은 TIPS: TEXT-IMAGE PRETRAINING WITH SPATIAL AWARENESS 입니다. https://arxiv.org/abs/2410.16512 TIPS: Text-Image Pretraining with Spatial awarenessWhile image-text representation learning has become very popular in recent years, existing models tend to lack spatial awareness and have limited direct applicability for dense understanding tasks. For this reason, self-supervised image-only pre..
이번에 리뷰할 논문은 SILC: Improving Vision Language Pretraining with Self-Distillation 입니다.https://arxiv.org/abs/2310.13355 SILC: Improving Vision Language Pretraining with Self-DistillationImage-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also..
이번에 리뷰할 논문은 LocCa: Visual Pretraining with Location-aware Captioners 입니다.https://arxiv.org/abs/2403.19596 LocCa: Visual Pretraining with Location-aware CaptionersImage captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper..
이번에 리뷰할 논문은 Learning to Prompt for Vision-Language Models 입니다.https://arxiv.org/abs/2109.01134 Learning to Prompt for Vision-Language ModelsLarge pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretiar..
이번에 리뷰할 논문은 CoCa: Contrastive Captioners are Image-Text Foundation Models 입니다. https://arxiv.org/abs/2205.01917 CoCa: Contrastive Captioners are Image-Text Foundation ModelsExploring large-scale pretrained foundation models is of significant interest in computer vision because these models can be quickly transferred to many downstream tasks. This paper presents Contrastive Captioner (CoCa), a mi..