'2025/07/16 글 목록

Notice

모바일 환경에서 수식이 깨지는 현상이 발생합니⋯

Recent Posts

Recent Comments

Link

Github
Gmail

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

글쓰기
방명록
RSS
관리

목록2025/07/16 (1)

Attention please

[강화 학습] Group Relative Policy Optimization (GRPO)

RL(Reinforcement Learning)은 SFT(Supervised Fine-Tuning) 이후 LLM의 수학적 추론 능력을 향상시키는 데 효과적인 것으로 입증되었습니다. 이번에는 LLM의 추론 능력을 향상시키는 다양한 강화학습 알고리즘들 중 GRPO 알고리즘을 소개하겠습니다. GRPO 알고리즘은 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 논문에서 제안된 방법론입니다.https://arxiv.org/abs/2402.03300 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsMathematica..

딥러닝/Reinforcement Learning 2025. 7. 16. 17:07

이전 Prev 1 Next 다음

목록2025/07/16 (1)

Attention please

티스토리툴바