DRL_code.zip
大小:17.37MB
价格:36积分
下载量:0
评分:
5.0
上传者:sinat_39620217
更新日期:2025-09-22

强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)

资源文件列表(大概)

文件名
大小
examples/
-
examples/Baselines/
-
examples/Baselines/GridDispatch_competition/
-
examples/Baselines/GridDispatch_competition/README.md
334B
examples/Baselines/Halite_competition/
-
examples/Baselines/Halite_competition/torch/
-
examples/Baselines/Halite_competition/torch/rl_trainer/
-
examples/Baselines/Halite_competition/torch/rl_trainer/controller.py
20.6KB
examples/DDPG/
-
examples/DDPG/train.py
5.27KB
examples/AlphaZero/
-
examples/AlphaZero/Coach.py
8.8KB
examples/A2C/
-
examples/A2C/actor.py
4.52KB
examples/A2C/atari_model.py
3.17KB
examples/DQN/
-
examples/DQN/README.md
849B
examples/AlphaZero/README.md
1.91KB
examples/A2C/atari_agent.py
4KB
examples/Baselines/GridDispatch_competition/torch/
-
examples/Baselines/GridDispatch_competition/torch/grid_model.py
2.54KB
examples/Baselines/GridDispatch_competition/torch/README.md
1.6KB
examples/AlphaZero/.pic/
-
examples/AlphaZero/.pic/perfect_moves_rate.png
64.44KB
examples/DDPG/mujoco_model.py
2.1KB
examples/DQN_variant/
-
examples/DQN_variant/train.py
6.56KB
examples/CARLA_SAC/
-
examples/CARLA_SAC/carla_agent.py
1.71KB
examples/Baselines/Halite_competition/torch/train.py
8.93KB
examples/CARLA_SAC/train.py
5.4KB
examples/DQN/requirements.txt
43B
examples/CARLA_SAC/evaluate.py
2.62KB
examples/CARLA_SAC/carla_model.py
3.29KB
examples/Baselines/Halite_competition/torch/rl_trainer/obs_parser.py
3.27KB
examples/Baselines/Halite_competition/torch/rl_trainer/agent.py
4.21KB
examples/Baselines/Halite_competition/paddle/
-
examples/Baselines/Halite_competition/paddle/rl_trainer/
-
examples/Baselines/Halite_competition/paddle/rl_trainer/obs_parser.py
3.27KB
examples/ES/
-
examples/ES/train.py
7.53KB
examples/ES/obs_filter.py
6.09KB
examples/IMPALA/
-
examples/IMPALA/atari_model.py
2.85KB
examples/ES/noise.py
955B
examples/MADDPG/
-
examples/MADDPG/README.md
3.16KB
examples/IMPALA/actor.py
3.9KB
examples/IMPALA/README.md
1.84KB
examples/ES/optimizers.py
1.82KB
examples/DDPG/mujoco_agent.py
1.98KB
examples/MADDPG/requirements.txt
56B
examples/AlphaZero/connect4_aiplayer.py
4.72KB
examples/AlphaZero/utils.py
1.8KB
examples/AlphaZero/main.py
2.78KB
examples/Baselines/GridDispatch_competition/paddle/
-
examples/Baselines/GridDispatch_competition/paddle/grid_agent.py
1.85KB
examples/DQN/train.py
4.31KB
examples/Baselines/Halite_competition/paddle/README.md
3.39KB
examples/Baselines/GridDispatch_competition/paddle/grid_model.py
2.55KB
examples/Baselines/Halite_competition/paddle/rl_trainer/utils.py
7.59KB
examples/CQL/
-
examples/CQL/mujoco_agent.py
1.83KB
examples/Baselines/Halite_competition/paddle/rl_trainer/replay_memory.py
3.66KB
examples/Baselines/Halite_competition/paddle/rl_trainer/algorithm.py
5.32KB
examples/Baselines/Halite_competition/torch/encode_model.py
972B
examples/AlphaZero/alphazero_agent.py
3.64KB
examples/CARLA_SAC/env_utils.py
3.87KB
examples/CARLA_SAC/env_config.py
2.72KB
examples/Baselines/Halite_competition/paddle/rl_trainer/model.py
2.25KB
examples/AlphaZero/connect4_game.py
7.87KB
examples/Baselines/Halite_competition/paddle/rl_trainer/controller.py
20.55KB
examples/AlphaZero/connect4_model.py
3.13KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/
-
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_model.py
6.4KB
examples/IMPALA/train.py
9.43KB
examples/ES/es.py
1.22KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_agent.py
8.61KB
examples/TD3/
-
examples/TD3/mujoco_agent.py
1.88KB
examples/Baselines/GridDispatch_competition/paddle/train.py
7.05KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/
-
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/submit_model.py
5.18KB
examples/Baselines/Halite_competition/torch/rl_trainer/policy.py
2.54KB
examples/NeurIPS2019-Learn-to-Move-Challenge/
-
examples/NeurIPS2019-Learn-to-Move-Challenge/env_wrapper.py
16.85KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/pelvisBasedObs_scaler.npz
4.22KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/pelvisBasedObs_scaler.npz
4.22KB
examples/Baselines/Halite_competition/torch/rl_trainer/algorithm.py
5.36KB
examples/Baselines/Halite_competition/paddle/rl_trainer/policy.py
2.46KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/test.py
3.33KB
examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py
1.86KB
examples/Baselines/Halite_competition/torch/test.ipynb
1.56KB
examples/ES/actor.py
4.37KB
examples/Baselines/Halite_competition/paddle/test.py
1.39KB
examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate.py
11.41KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/
-
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/env_wrapper.py
9.75KB
examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate_args.py
2.46KB
examples/ES/README.md
1.47KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/submit_model.py
5.54KB
examples/DQN_variant/replay_memory.py
4.09KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/official_obs_scaler.npz
2.2KB
examples/NeurIPS2019-Learn-to-Move-Challenge/official_obs_scaler.npz
2.2KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/test.py
2.51KB
examples/Baselines/Halite_competition/torch/README.md
3.39KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/
-
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty2.sh
256B
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3_first_target.sh
338B
examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py
3.51KB
examples/ES/utils.py
2.06KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty1.sh
255B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/
-
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/
-
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es_agent.py
2.85KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/
-
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/evaluate.py
2.79KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3.sh
292B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/powernet_model.py
2.6KB
examples/PPO/
-
examples/PPO/atari_config.py
2.19KB
examples/NeurIPS2019-Learn-to-Move-Challenge/replay_memory.py
60B
examples/PPO/agent.py
4.43KB
examples/ES/requirements.txt
58B
examples/AlphaZero/actor.py
6.72KB
examples/PPO/mujoco_config.py
2.18KB
examples/Baselines/GridDispatch_competition/paddle/env_wrapper.py
4.52KB
examples/Baselines/Halite_competition/paddle/encode_model.py
974B
examples/Baselines/GridDispatch_competition/torch/env_wrapper.py
4.52KB
examples/tutorials/
-
examples/tutorials/homework/
-
examples/tutorials/homework/lesson4/
-
examples/tutorials/homework/lesson4/policy_gradient_pong/
-
examples/tutorials/homework/lesson4/policy_gradient_pong/model.py
1.08KB
examples/Baselines/Halite_competition/paddle/train.py
8.82KB
examples/tutorials/homework/lesson3/
-
examples/tutorials/homework/lesson3/dqn_mountaincar/
-
examples/tutorials/homework/lesson3/dqn_mountaincar/replay_memory.py
1.64KB
examples/tutorials/parl2_dygraph/
-
examples/tutorials/parl2_dygraph/lesson3/
-
examples/tutorials/parl2_dygraph/lesson3/dqn/
-
examples/tutorials/parl2_dygraph/lesson3/dqn/train.py
4.7KB
examples/tutorials/lesson5/
-
examples/tutorials/lesson5/ddpg/
-
examples/tutorials/lesson5/ddpg/replay_memory.py
1.64KB
examples/tutorials/homework/lesson4/policy_gradient_pong/agent.py
2.87KB
examples/tutorials/lesson1/
-
examples/tutorials/lesson1/gridworld.py
6.62KB
examples/tutorials/homework/lesson5/
-
examples/tutorials/homework/lesson5/ddpg_quadrotor/
-
examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_model.py
1.92KB
examples/Baselines/GridDispatch_competition/paddle/README.md
1.61KB
examples/tutorials/lesson4/
-
examples/tutorials/lesson4/policy_gradient/
-
examples/tutorials/lesson4/policy_gradient/agent.py
2.87KB
examples/CQL/train.py
4.36KB
examples/tutorials/homework/lesson4/policy_gradient_pong/train.py
4.23KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_head_ddpg.py
4.82KB
examples/AlphaZero/requirements.txt
37B
examples/DQN/cartpole_agent.py
3.17KB
examples/A2C/.result/
-
examples/A2C/.result/result_a2c_paddle0.png
193.24KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/replay_memory.py
3.6KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_server.py
11.88KB
examples/others/
-
examples/others/deepes.py
3.13KB
examples/SAC/
-
examples/SAC/mujoco_model.py
2.55KB
examples/tutorials/homework/lesson2/
-
examples/tutorials/homework/lesson2/q_learning_frozenlake/
-
examples/tutorials/homework/lesson2/q_learning_frozenlake/agent.py
2.73KB
examples/tutorials/lesson2/
-
examples/tutorials/lesson2/q_learning/
-
examples/tutorials/lesson2/q_learning/agent.py
2.73KB
examples/CQL/README.md
1.51KB
examples/Baselines/GridDispatch_competition/torch/train.py
7.04KB
examples/Baselines/Halite_competition/torch/requirements.txt
25B
examples/Baselines/Halite_competition/paddle/rl_trainer/agent.py
4.03KB
examples/Baselines/Halite_competition/torch/rl_trainer/model.py
2.24KB
examples/DDPG/README.md
1.11KB
examples/DQN/cartpole_model.py
1.3KB
examples/Baselines/Halite_competition/paddle/submission.py
99.84KB
examples/A2C/requirements.txt
67B
examples/DDPG/requirements.txt
58B
examples/Baselines/Halite_competition/paddle/test.ipynb
1.46KB
examples/MADDPG/train.py
6.93KB
examples/TD3/requirements.txt
58B
examples/SAC/requirements.txt
58B
examples/CQL/requirements.txt
121B
examples/A2C/README.md
1.4KB
examples/A2C/train.py
7.1KB
examples/Baselines/Halite_competition/torch/config.py
1.35KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/test.py
3.22KB
examples/MADDPG/simple_model.py
3.59KB
examples/QuickStart/
-
examples/QuickStart/cartpole_model.py
1.23KB
examples/IMPALA/atari_agent.py
2.91KB
examples/Baselines/Halite_competition/torch/submission.py
100.1KB
examples/TD3/README.md
1.24KB
examples/QuickStart/cartpole_agent.py
2.27KB
examples/SAC/train.py
5.09KB
examples/MADDPG/simple_agent.py
4.43KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/env_wrapper.py
17.21KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/env_wrapper.py
28.33KB
examples/DQN_variant/atari_model.py
3.3KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/mlp_model.py
6.49KB
examples/OAC/
-
examples/OAC/requirements.txt
58B
examples/NeurIPS2019-Learn-to-Move-Challenge/README.md
3.2KB
examples/TD3/train.py
5.12KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es.py
1.57KB
examples/PPO/requirements_mujoco.txt
58B
examples/PPO/env_utils.py
6.95KB
examples/NeurIPS2019-Learn-to-Move-Challenge/train.py
11.9KB
examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/mlp_model.py
6.46KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty1.sh
341B
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty2.sh
320B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es.py
1.23KB
examples/PPO/requirements_atari.txt
74B
examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_agent.py
2.65KB
examples/QMIX/
-
examples/QMIX/replay_buffer.py
3.33KB
examples/PPO/mujoco_model.py
1.96KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/README.md
659B
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/
-
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/velocity_distribution.png
27.9KB
examples/tutorials/homework/lesson5/ddpg_quadrotor/train.py
6.11KB
examples/QuickStart/README.md
435B
examples/QuickStart/requirements.txt
43B
examples/tutorials/parl2_dygraph/lesson5/
-
examples/tutorials/parl2_dygraph/lesson5/ddpg/
-
examples/tutorials/parl2_dygraph/lesson5/ddpg/replay_memory.py
1.64KB
examples/tutorials/parl2_dygraph/lesson3/dqn/replay_memory.py
1.64KB
examples/tutorials/parl2_dygraph/lesson3/homework/
-
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/
-
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/replay_memory.py
1.64KB
examples/tutorials/parl2_dygraph/lesson5/homework/
-
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/
-
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_model.py
2.13KB
examples/QMIX/qmix_config.py
2.69KB
examples/tutorials/parl2_dygraph/lesson3/dqn/agent.py
2.79KB
examples/tutorials/homework/lesson3/dqn_mountaincar/model.py
1.11KB
examples/tutorials/lesson3/
-
examples/tutorials/lesson3/dqn/
-
examples/tutorials/lesson3/dqn/model.py
1.11KB
examples/tutorials/homework/lesson2/q_learning_frozenlake/train.py
2.56KB
examples/tutorials/parl2_dygraph/lesson3/dqn/model.py
1.3KB
examples/QMIX/rnn_model.py
1.45KB
examples/A2C/a2c_config.py
1.29KB
examples/DQN/cartpole.jpg
110.07KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/model.py
1.3KB
examples/tutorials/lesson5/ddpg/env.py
6.33KB
examples/AlphaZero/.pic/good_moves_rate.png
60.06KB
examples/Baselines/Halite_competition/torch/rl_trainer/replay_memory.py
3.6KB
examples/CARLA_SAC/README.md
2.78KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/utils.py
3.25KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md
6.94KB
examples/tutorials/lesson5/ddpg/train.py
4.25KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/evaluate.py
2.79KB
examples/DQN_variant/atari_agent.py
4.11KB
examples/IMPALA/impala_config.py
1.5KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/competition.png
184.81KB
examples/PPO/storage.py
3.09KB
examples/OAC/mujoco_agent.py
1.85KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3_first_target.sh
416B
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es_agent.py
1.62KB
examples/tutorials/lesson3/dqn/replay_memory.py
1.64KB
examples/Baselines/Halite_competition/paddle/config.py
1.35KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/README.md
718B
examples/QMIX/utils.py
1.66KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/powernet_model.py
2.67KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/curriculum-learning.png
158.38KB
examples/Baselines/GridDispatch_competition/torch/grid_agent.py
1.97KB
examples/CARLA_SAC/.benchmark/
-
examples/CARLA_SAC/.benchmark/Lane_bend.gif
3.19MB
examples/tutorials/parl2_dygraph/README.md
1.38KB
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_agent.py
2.01KB
examples/tutorials/parl2_dygraph/lesson3/dqn/algorithm.py
2.86KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/agent.py
2.79KB
examples/tutorials/lesson4/policy_gradient/algorithm.py
1.7KB
examples/tutorials/lesson4/policy_gradient/model.py
1.04KB
examples/tutorials/homework/lesson3/dqn_mountaincar/train.py
4.72KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2.py
7.22KB
examples/tutorials/lesson3/dqn/agent.py
3.89KB
examples/Baselines/Halite_competition/torch/rl_trainer/utils.py
7.64KB
examples/tutorials/homework/lesson3/dqn_mountaincar/agent.py
3.89KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/
-
examples/tutorials/homework/lesson2/sarsa_frozenlake/gridworld.py
6.53KB
examples/tutorials/homework/lesson2/q_learning_frozenlake/gridworld.py
6.53KB
examples/DQN_variant/.benchmark/
-
examples/DQN_variant/.benchmark/Dueling DQN.png
218.21KB
examples/tutorials/lesson2/sarsa/
-
examples/tutorials/lesson2/sarsa/gridworld.py
6.53KB
examples/tutorials/requirements.txt
126B
examples/tutorials/lesson2/sarsa/train.py
2.95KB
examples/tutorials/lesson2/q_learning/gridworld.py
6.53KB
examples/tutorials/lesson5/ddpg/model.py
1.73KB
examples/SAC/mujoco_agent.py
1.83KB
examples/tutorials/lesson3/dqn/train.py
4.82KB
examples/IMPALA/requirements.txt
74B
examples/DQN_variant/requirements.txt
79B
examples/TD3/mujoco_model.py
2.54KB
examples/Baselines/Halite_competition/torch/test.py
1.44KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/args.py
3.51KB
examples/tutorials/parl2_dygraph/lesson4/
-
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/
-
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/agent.py
1.8KB
examples/tutorials/parl2_dygraph/lesson4/homework/
-
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/
-
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/agent.py
1.8KB
examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/train.py
4.67KB
examples/OAC/README.md
1.04KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/utils.py
13.97KB
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/train.py
4.29KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/algorithm.py
1.94KB
examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/model.py
1.35KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/utils.py
2.59KB
examples/QMIX/qmix_agent.py
5.35KB
examples/OAC/mujoco_model.py
2.55KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/README.md
700B
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/demo.gif
4.58MB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/fastest.png
270.84KB
examples/PPO/atari_model.py
2.03KB
examples/PPO/README.md
2.48KB
examples/Baselines/Halite_competition/paddle/requirements.txt
32B
examples/tutorials/lesson5/ddpg/algorithm.py
3.46KB
examples/tutorials/lesson5/ddpg/agent.py
2.67KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2_grpc.py
1.93KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_client.py
4.25KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/train.py
2.67KB
examples/QMIX/qmixer_model.py
3.06KB
examples/QMIX/train.py
6.51KB
examples/tutorials/lesson4/policy_gradient/train.py
3.66KB
examples/CQL/mujoco_model.py
2.78KB
examples/tutorials/parl2_dygraph/requirements.txt
130B
examples/SAC/README.md
1.24KB
examples/NeurIPS2019-Learn-to-Move-Challenge/train_args.py
2.73KB
examples/DQN_variant/README.md
2.65KB
examples/QMIX/README.md
1.31KB
examples/QMIX/requirements.txt
37B
examples/QMIX/env_wrapper.py
3.11KB
examples/QuickStart/train.py
3.83KB
examples/AlphaZero/MCTS.py
5.83KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/train.py
4.21KB
examples/tutorials/lesson3/dqn/algorithm.py
3.02KB
examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/train.py
6.06KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/agent.py
2.31KB
examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_model.py
5.81KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/model.py
1.26KB
examples/tutorials/parl2_dygraph/lesson4/policy_gradient/train.py
3.65KB
examples/tutorials/lesson2/q_learning/train.py
2.85KB
examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3.sh
416B
examples/tutorials/README.md
1.74KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/agent.py
19.08KB
examples/tutorials/homework/lesson2/sarsa_frozenlake/agent.py
2.77KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/agent.py
12.96KB
examples/tutorials/lesson2/sarsa/agent.py
2.77KB
examples/PPO/train.py
5.99KB
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/
-
examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/l2rpn.jpeg
69.44KB
examples/OAC/train.py
5.45KB
examples/Baselines/Halite_competition/paddle/model/
-
examples/Baselines/Halite_competition/paddle/model/latest_ship_model.pth
325.14KB
examples/AlphaZero/Arena.py
3.24KB
examples/QuickStart/performance.gif
237.51KB
examples/NeurIPS2019-Learn-to-Move-Challenge/image/
-
examples/NeurIPS2019-Learn-to-Move-Challenge/image/performance.gif
782.27KB
examples/CARLA_SAC/.benchmark/carla_sac.png
141.86KB
examples/A2C/.result/result_a2c_paddle1.png
203.23KB
examples/Baselines/Halite_competition/torch/model/
-
examples/Baselines/Halite_competition/torch/model/latest_ship_model.pth
338.03KB
examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/last course.png
360.06KB
examples/QMIX/images/
-
examples/QMIX/images/paddle2.0_qmix_result.png
97.1KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/algorithm.py
3.69KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/env.py
6.33KB
examples/CARLA_SAC/model.ckpt
4.63MB
examples/ES/mujoco_agent.py
2.77KB
examples/tutorials/parl2_dygraph/lesson5/ddpg/model.py
1.94KB
examples/ES/es_config.py
1.2KB
examples/ES/mujoco_model.py
1.93KB

资源内容介绍

强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)内涵20+强化学习经典算法代码。对应使用教程什么的参考博客:多智能体(前沿算法+原理)https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502强化学习基础篇(单智能体算法)https://blog.csdn.net/sinat_39620217/category_10940146.html
# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge<p align="center"><img src="image/competition.png" alt="PARL" width="800"/></p>This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.For more technical details about our solution, we provide:1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.<p align="center"><img src="image/demo.gif" alt="PARL" width="500"/></p>## Dependencies- python3.6- [parl==1.0](https://github.com/PaddlePaddle/PARL)- [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle)- [osim-rl](https://github.com/stanfordnmbl/osim-rl)- [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html)- tqdm- tensorflow (To use tensorboard)## Part1: Final submitted model### ResultFor final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes ||----------------------------|---------------------------------|------------|-------------------|| 9968.5404 | 9980.3952 | 0.0026 | 5000 |### Test- How to Run 1. Enter the sub-folder `final_submit` 2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 3. Unpack the file by using: `tar zxvf saved_model.tar.gz` 4. Launch the test script: `python test.py`## Part2: Curriculum learning<p align="center"><img src="image/curriculum-learning.png" alt="PARL" width="500"/></p>#### 1. Target: Run as fast as possible<p align="center"><img src="image/fastest.png" alt="PARL" width="800"/></p>```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 # client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest```#### 2. Target: run at 3.0 m/s```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [RunFastest model]# client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \ --act_penalty_lowerbound 1.5 ```#### 3. target: walk at 2.0 m/s```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 3.0m/s model]# client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \ --act_penalty_lowerbound 0.75 ```#### 4. target: walk slowly at 1.25 m/s<p align="center"><img src="image/last course.png" alt="PARL" width="800"/></p>```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 2.0m/s model] # client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \ --act_penalty_lowerbound 0.6```## Part3: Training in random velocity environment for round2 evaluationAs mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head # client (Suggest: 100+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \ --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3```### Test trained model```bashpython test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]```### Other implementation details<p align="center"><img src="image/velocity_distribution.png" alt="PARL" width="800"/></p>Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)## AcknowledgmentsWe would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.

用户评论 (0)

发表评论

captcha