'2025/07 글 목록

Notice

모바일 환경에서 수식이 깨지는 현상이 발생합니⋯

Recent Posts

Recent Comments

Link

Github
Gmail

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록2025/07 (9)

Attention please

[논문 리뷰] CoOp: Learning to Prompt for Vision-Language Models(2022)

이번에 리뷰할 논문은 Learning to Prompt for Vision-Language Models 입니다.https://arxiv.org/abs/2109.01134 Learning to Prompt for Vision-Language ModelsLarge pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretiar..

논문 리뷰/Vision-Language model 2025. 7. 28. 15:27

[논문 리뷰] CoCa: Contrastive Captioners are Image-TextFoundation Models(2022)

이번에 리뷰할 논문은 CoCa: Contrastive Captioners are Image-Text Foundation Models 입니다. https://arxiv.org/abs/2205.01917 CoCa: Contrastive Captioners are Image-Text Foundation ModelsExploring large-scale pretrained foundation models is of significant interest in computer vision because these models can be quickly transferred to many downstream tasks. This paper presents Contrastive Captioner (CoCa), a mi..

논문 리뷰/Vision-Language model 2025. 7. 24. 14:26

[논문 리뷰] SigLIP: Sigmoid Loss for Language Image Pre-Training(2023)

이번에 리뷰할 논문은 Sigmoid Loss for Language Image Pre-Training 입니다.https://arxiv.org/abs/2303.15343 Sigmoid Loss for Language Image Pre-TrainingWe propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise simarxi..

논문 리뷰/Vision-Language model 2025. 7. 21. 17:13

[논문 리뷰] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning (2025)

이번에 리뷰할 논문은 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 입니다.https://arxiv.org/abs/2501.12948 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningWe introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised f..

논문 리뷰/Large Language Model 2025. 7. 17. 17:04

[강화 학습] GRPO: Group Relative Policy Optimization(2024)

RL(Reinforcement Learning)은 SFT(Supervised Fine-Tuning) 이후 LLM의 수학적 추론 능력을 향상시키는 데 효과적인 것으로 입증되었습니다. 이번에는 LLM의 추론 능력을 향상시키는 다양한 강화학습 알고리즘들 중 GRPO 알고리즘을 소개하겠습니다. GRPO 알고리즘은 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 논문에서 제안된 방법론입니다.https://arxiv.org/abs/2402.03300 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsMathematica..

딥러닝/Reinforcement Learning 2025. 7. 16. 17:07

[강화 학습] PPO Algorithms

앞서 다루었던 A2C 알고리즘의 경우, sampling 후 재사용이 불가능하다는 단점이 존재했습니다. 2025.07.14 - [딥러닝/Reinforcement Learning] - [강화 학습] A2C Algorithms [강화 학습] A2C Algorithm앞서 알아보았던 Actor-Critic 알고리즘에 사용되었던 gradient는 다음과 같습니다. $$ \nabla_\theta J_\theta \simeq \sum_{t=0}^{\infty} \int_{s_t, a_t} \nabla_\theta \ln p_\theta(a_t \mid s_t) \cdot Q(s_t, a_t) \cdot p_\theta(s_t, a_t)smcho1201.tistory.com 이러한 한계점을 극복하고자 제안된 알고리즘이 ..

딥러닝/Reinforcement Learning 2025. 7. 15. 13:54

이전 Prev 1 2 Next 다음

목록2025/07 (9)

Attention please

티스토리툴바