Attention please

Notice

모바일 환경에서 수식이 깨지는 현상이 발생합니⋯

Recent Posts

Recent Comments

Link

Github
Gmail

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록전체 글 (130)

Attention please

[논문 리뷰] Dueling DQN: Dueling Network Architectures for Deep Reinforcement Learning (2016)

이번에 리뷰할 논문은 Dueling Network Architectures for Deep Reinforcement Learning 입니다. https://arxiv.org/abs/1511.06581 Dueling Network Architectures for Deep Reinforcement LearningIn recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders..

논문 리뷰/Reinforcement Learning 2025. 4. 22. 04:11

[논문 리뷰] DDQN: Deep Reinforcement Learning with Double Q-learning (2016)

이번에 리뷰할 논문은 Deep Reinforcement Learning with Double Q-learning 입니다.https://arxiv.org/abs/1509.06461 Deep Reinforcement Learning with Double Q-learningThe popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be pr..

논문 리뷰/Reinforcement Learning 2025. 4. 22. 01:44

[논문 리뷰] DQN: Playing Atari with Deep Reinforcement Learning (2013, 2015)

이번에 리뷰할 논문은 Playing Atari with Deep Reinforcement Learning 입니다.https://arxiv.org/abs/1312.5602 Playing Atari with Deep Reinforcement LearningWe present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is rawa..

논문 리뷰/Reinforcement Learning 2025. 4. 20. 02:56

[강화 학습] Q-learning 완전 정복

Q-learning앞서 다루었던 TD(Temporal Difference) 에서 target policy와 behavior policy가 동일한 경우 on-policy, 동일하지 않은 경우 off-policy라고 하였습니다. 그 중, 이번에 다룰 Q-learning은 off-policy 알고리즘입니다.2025.04.15 - [딥러닝/Reinforcement Learning] - [강화 학습] On-policy vs Off-policy [강화 학습] On-policy vs Off-policyTemporal DifferenceOn-policy 와 Off-policy에 대해 들어가기 전 TD(Temporal Difference)에 대해 다시 한번 짚고 넘어가보도록 하겠습니다. $$Q(s_t, a_t) \lef..

딥러닝/Reinforcement Learning 2025. 4. 19. 20:08

[강화 학습] On-policy vs Off-policy

Temporal DifferenceOn-policy 와 Off-policy에 대해 들어가기 전 TD(Temporal Difference)에 대해 다시 한번 짚고 넘어가보도록 하겠습니다. $$Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left( R_t + \gamma Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t) \right)$$ 위 수식은 TD를 보여주고 있으며, $\alpha$는 학습률을 나타내며, TD의 가장 핵심이 되는 $ Q(s_{t+1}, a_{t+1}) $ 는 다음 상태-행동의 예측 value값인 TD target이 됩니다. $$Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left( \underbrace..

딥러닝/Reinforcement Learning 2025. 4. 15. 02:23

[강화 학습] Monte Carlo(MC) & Temporal Difference(TD)

목표는 Value Function 추정강화학습의 목표는 "환경과 상호작용하면서, 누적 보상을 최대화하는 정책을 찾는 것" 입니다. 이 목표를 달성하기 위해서 Agent는 매 순간 "지금 상태에서 어떤 행동이 장기적으로 가장 이득이 되는 행동일까?" 를 판단하여야 합니다.물론 지금 당장 눈앞의 보상만 보고 판단하면 안되며, 미래에 어떤 일이 일어날지 생각하며 행동해야 합니다. 이와 같이 최적의 행동을 하기 위해 등장한 것이 바로 Value Function 입니다. Value Function은 크게 "State Value Function", "Action Value Function" 두 가지 종류가 있습니다. 1. State Value Function: 상태 $s$에 있을 때, 정책 $\pi$를 따를 경우 ..

딥러닝/Reinforcement Learning 2025. 4. 14. 00:52

이전 Prev 1 2 3 4 ··· 22 Next 다음

목록전체 글 (130)

Attention please

티스토리툴바