强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）

资源文件列表(大概)

文件名

大小

examples/

examples/Baselines/

examples/Baselines/GridDispatch_competition/

examples/Baselines/GridDispatch_competition/README.md

334B

examples/Baselines/Halite_competition/

examples/Baselines/Halite_competition/torch/

examples/Baselines/Halite_competition/torch/rl_trainer/

examples/Baselines/Halite_competition/torch/rl_trainer/controller.py

20.6KB

examples/DDPG/

examples/DDPG/train.py

5.27KB

examples/AlphaZero/

examples/AlphaZero/Coach.py

8.8KB

examples/A2C/

examples/A2C/actor.py

4.52KB

examples/A2C/atari_model.py

3.17KB

examples/DQN/

examples/DQN/README.md

849B

examples/AlphaZero/README.md

1.91KB

examples/A2C/atari_agent.py

4KB

examples/Baselines/GridDispatch_competition/torch/

examples/Baselines/GridDispatch_competition/torch/grid_model.py

2.54KB

examples/Baselines/GridDispatch_competition/torch/README.md

1.6KB

examples/AlphaZero/.pic/

examples/AlphaZero/.pic/perfect_moves_rate.png

64.44KB

examples/DDPG/mujoco_model.py

2.1KB

examples/DQN_variant/

examples/DQN_variant/train.py

6.56KB

examples/CARLA_SAC/

examples/CARLA_SAC/carla_agent.py

1.71KB

examples/Baselines/Halite_competition/torch/train.py

8.93KB

examples/CARLA_SAC/train.py

5.4KB

examples/DQN/requirements.txt

43B

examples/CARLA_SAC/evaluate.py

2.62KB

examples/CARLA_SAC/carla_model.py

3.29KB

examples/Baselines/Halite_competition/torch/rl_trainer/obs_parser.py

3.27KB

examples/Baselines/Halite_competition/torch/rl_trainer/agent.py

4.21KB

examples/Baselines/Halite_competition/paddle/

examples/Baselines/Halite_competition/paddle/rl_trainer/

examples/Baselines/Halite_competition/paddle/rl_trainer/obs_parser.py

3.27KB

examples/ES/

examples/ES/train.py

7.53KB

examples/ES/obs_filter.py

6.09KB

examples/IMPALA/

examples/IMPALA/atari_model.py

2.85KB

examples/ES/noise.py

955B

examples/MADDPG/

examples/MADDPG/README.md

3.16KB

examples/IMPALA/actor.py

3.9KB

examples/IMPALA/README.md

1.84KB

examples/ES/optimizers.py

1.82KB

examples/DDPG/mujoco_agent.py

1.98KB

examples/MADDPG/requirements.txt

56B

examples/AlphaZero/connect4_aiplayer.py

4.72KB

examples/AlphaZero/utils.py

1.8KB

examples/AlphaZero/main.py

2.78KB

examples/Baselines/GridDispatch_competition/paddle/

examples/Baselines/GridDispatch_competition/paddle/grid_agent.py

1.85KB

examples/DQN/train.py

4.31KB

examples/Baselines/Halite_competition/paddle/README.md

3.39KB

examples/Baselines/GridDispatch_competition/paddle/grid_model.py

2.55KB

examples/Baselines/Halite_competition/paddle/rl_trainer/utils.py

7.59KB

examples/CQL/

examples/CQL/mujoco_agent.py

1.83KB

examples/Baselines/Halite_competition/paddle/rl_trainer/replay_memory.py

3.66KB

examples/Baselines/Halite_competition/paddle/rl_trainer/algorithm.py

5.32KB

examples/Baselines/Halite_competition/torch/encode_model.py

972B

examples/AlphaZero/alphazero_agent.py

3.64KB

examples/CARLA_SAC/env_utils.py

3.87KB

examples/CARLA_SAC/env_config.py

2.72KB

examples/Baselines/Halite_competition/paddle/rl_trainer/model.py

2.25KB

examples/AlphaZero/connect4_game.py

7.87KB

examples/Baselines/Halite_competition/paddle/rl_trainer/controller.py

20.55KB

examples/AlphaZero/connect4_model.py

3.13KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_model.py

6.4KB

examples/IMPALA/train.py

9.43KB

examples/ES/es.py

1.22KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim_agent.py

8.61KB

examples/TD3/

examples/TD3/mujoco_agent.py

1.88KB

examples/Baselines/GridDispatch_competition/paddle/train.py

7.05KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/submit_model.py

5.18KB

examples/Baselines/Halite_competition/torch/rl_trainer/policy.py

2.54KB

examples/NeurIPS2019-Learn-to-Move-Challenge/

examples/NeurIPS2019-Learn-to-Move-Challenge/env_wrapper.py

16.85KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/pelvisBasedObs_scaler.npz

4.22KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/pelvisBasedObs_scaler.npz

4.22KB

examples/Baselines/Halite_competition/torch/rl_trainer/algorithm.py

5.36KB

examples/Baselines/Halite_competition/paddle/rl_trainer/policy.py

2.46KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/test.py

3.33KB

examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py

1.86KB

examples/Baselines/Halite_competition/torch/test.ipynb

1.56KB

examples/ES/actor.py

4.37KB

examples/Baselines/Halite_competition/paddle/test.py

1.39KB

examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate.py

11.41KB

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/env_wrapper.py

9.75KB

examples/NeurIPS2019-Learn-to-Move-Challenge/evaluate_args.py

2.46KB

examples/ES/README.md

1.47KB

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/submit_model.py

5.54KB

examples/DQN_variant/replay_memory.py

4.09KB

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/official_obs_scaler.npz

2.2KB

examples/NeurIPS2019-Learn-to-Move-Challenge/official_obs_scaler.npz

2.2KB

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/test.py

2.51KB

examples/Baselines/Halite_competition/torch/README.md

3.39KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty2.sh

256B

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3_first_target.sh

338B

examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py

3.51KB

examples/ES/utils.py

2.06KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty1.sh

255B

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es_agent.py

2.85KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/evaluate.py

2.79KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/eval_difficulty3.sh

292B

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/powernet_model.py

2.6KB

examples/PPO/

examples/PPO/atari_config.py

2.19KB

examples/NeurIPS2019-Learn-to-Move-Challenge/replay_memory.py

60B

examples/PPO/agent.py

4.43KB

examples/ES/requirements.txt

58B

examples/AlphaZero/actor.py

6.72KB

examples/PPO/mujoco_config.py

2.18KB

examples/Baselines/GridDispatch_competition/paddle/env_wrapper.py

4.52KB

examples/Baselines/Halite_competition/paddle/encode_model.py

974B

examples/Baselines/GridDispatch_competition/torch/env_wrapper.py

4.52KB

examples/tutorials/

examples/tutorials/homework/

examples/tutorials/homework/lesson4/

examples/tutorials/homework/lesson4/policy_gradient_pong/

examples/tutorials/homework/lesson4/policy_gradient_pong/model.py

1.08KB

examples/Baselines/Halite_competition/paddle/train.py

8.82KB

examples/tutorials/homework/lesson3/

examples/tutorials/homework/lesson3/dqn_mountaincar/

examples/tutorials/homework/lesson3/dqn_mountaincar/replay_memory.py

1.64KB

examples/tutorials/parl2_dygraph/

examples/tutorials/parl2_dygraph/lesson3/

examples/tutorials/parl2_dygraph/lesson3/dqn/

examples/tutorials/parl2_dygraph/lesson3/dqn/train.py

4.7KB

examples/tutorials/lesson5/

examples/tutorials/lesson5/ddpg/

examples/tutorials/lesson5/ddpg/replay_memory.py

1.64KB

examples/tutorials/homework/lesson4/policy_gradient_pong/agent.py

2.87KB

examples/tutorials/lesson1/

examples/tutorials/lesson1/gridworld.py

6.62KB

examples/tutorials/homework/lesson5/

examples/tutorials/homework/lesson5/ddpg_quadrotor/

examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_model.py

1.92KB

examples/Baselines/GridDispatch_competition/paddle/README.md

1.61KB

examples/tutorials/lesson4/

examples/tutorials/lesson4/policy_gradient/

examples/tutorials/lesson4/policy_gradient/agent.py

2.87KB

examples/CQL/train.py

4.36KB

examples/tutorials/homework/lesson4/policy_gradient_pong/train.py

4.23KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_head_ddpg.py

4.82KB

examples/AlphaZero/requirements.txt

37B

examples/DQN/cartpole_agent.py

3.17KB

examples/A2C/.result/

examples/A2C/.result/result_a2c_paddle0.png

193.24KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/replay_memory.py

3.6KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_server.py

11.88KB

examples/others/

examples/others/deepes.py

3.13KB

examples/SAC/

examples/SAC/mujoco_model.py

2.55KB

examples/tutorials/homework/lesson2/

examples/tutorials/homework/lesson2/q_learning_frozenlake/

examples/tutorials/homework/lesson2/q_learning_frozenlake/agent.py

2.73KB

examples/tutorials/lesson2/

examples/tutorials/lesson2/q_learning/

examples/tutorials/lesson2/q_learning/agent.py

2.73KB

examples/CQL/README.md

1.51KB

examples/Baselines/GridDispatch_competition/torch/train.py

7.04KB

examples/Baselines/Halite_competition/torch/requirements.txt

25B

examples/Baselines/Halite_competition/paddle/rl_trainer/agent.py

4.03KB

examples/Baselines/Halite_competition/torch/rl_trainer/model.py

2.24KB

examples/DDPG/README.md

1.11KB

examples/DQN/cartpole_model.py

1.3KB

examples/Baselines/Halite_competition/paddle/submission.py

99.84KB

examples/A2C/requirements.txt

67B

examples/DDPG/requirements.txt

58B

examples/Baselines/Halite_competition/paddle/test.ipynb

1.46KB

examples/MADDPG/train.py

6.93KB

examples/TD3/requirements.txt

58B

examples/SAC/requirements.txt

58B

examples/CQL/requirements.txt

121B

examples/A2C/README.md

1.4KB

examples/A2C/train.py

7.1KB

examples/Baselines/Halite_competition/torch/config.py

1.35KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/test.py

3.22KB

examples/MADDPG/simple_model.py

3.59KB

examples/QuickStart/

examples/QuickStart/cartpole_model.py

1.23KB

examples/IMPALA/atari_agent.py

2.91KB

examples/Baselines/Halite_competition/torch/submission.py

100.1KB

examples/TD3/README.md

1.24KB

examples/QuickStart/cartpole_agent.py

2.27KB

examples/SAC/train.py

5.09KB

examples/MADDPG/simple_agent.py

4.43KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/env_wrapper.py

17.21KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/env_wrapper.py

28.33KB

examples/DQN_variant/atari_model.py

3.3KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/final_submit/mlp_model.py

6.49KB

examples/OAC/

examples/OAC/requirements.txt

58B

examples/NeurIPS2019-Learn-to-Move-Challenge/README.md

3.2KB

examples/TD3/train.py

5.12KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/es.py

1.57KB

examples/PPO/requirements_mujoco.txt

58B

examples/PPO/env_utils.py

6.95KB

examples/NeurIPS2019-Learn-to-Move-Challenge/train.py

11.9KB

examples/NeurIPS2019-Learn-to-Move-Challenge/final_submit/mlp_model.py

6.46KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty1.sh

341B

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty2.sh

320B

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es.py

1.23KB

examples/PPO/requirements_atari.txt

74B

examples/tutorials/homework/lesson5/ddpg_quadrotor/quadrotor_agent.py

2.65KB

examples/QMIX/

examples/QMIX/replay_buffer.py

3.33KB

examples/PPO/mujoco_model.py

1.96KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/README.md

659B

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/velocity_distribution.png

27.9KB

examples/tutorials/homework/lesson5/ddpg_quadrotor/train.py

6.11KB

examples/QuickStart/README.md

435B

examples/QuickStart/requirements.txt

43B

examples/tutorials/parl2_dygraph/lesson5/

examples/tutorials/parl2_dygraph/lesson5/ddpg/

examples/tutorials/parl2_dygraph/lesson5/ddpg/replay_memory.py

1.64KB

examples/tutorials/parl2_dygraph/lesson3/dqn/replay_memory.py

1.64KB

examples/tutorials/parl2_dygraph/lesson3/homework/

examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/

examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/replay_memory.py

1.64KB

examples/tutorials/parl2_dygraph/lesson5/homework/

examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/

examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_model.py

2.13KB

examples/QMIX/qmix_config.py

2.69KB

examples/tutorials/parl2_dygraph/lesson3/dqn/agent.py

2.79KB

examples/tutorials/homework/lesson3/dqn_mountaincar/model.py

1.11KB

examples/tutorials/lesson3/

examples/tutorials/lesson3/dqn/

examples/tutorials/lesson3/dqn/model.py

1.11KB

examples/tutorials/homework/lesson2/q_learning_frozenlake/train.py

2.56KB

examples/tutorials/parl2_dygraph/lesson3/dqn/model.py

1.3KB

examples/QMIX/rnn_model.py

1.45KB

examples/A2C/a2c_config.py

1.29KB

examples/DQN/cartpole.jpg

110.07KB

examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/model.py

1.3KB

examples/tutorials/lesson5/ddpg/env.py

6.33KB

examples/AlphaZero/.pic/good_moves_rate.png

60.06KB

examples/Baselines/Halite_competition/torch/rl_trainer/replay_memory.py

3.6KB

examples/CARLA_SAC/README.md

2.78KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/utils.py

3.25KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/README.md

6.94KB

examples/tutorials/lesson5/ddpg/train.py

4.25KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/evaluate.py

2.79KB

examples/DQN_variant/atari_agent.py

4.11KB

examples/IMPALA/impala_config.py

1.5KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/competition.png

184.81KB

examples/PPO/storage.py

3.09KB

examples/OAC/mujoco_agent.py

1.85KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3_first_target.sh

416B

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/es_agent.py

1.62KB

examples/tutorials/lesson3/dqn/replay_memory.py

1.64KB

examples/Baselines/Halite_competition/paddle/config.py

1.35KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/README.md

718B

examples/QMIX/utils.py

1.66KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/powernet_model.py

2.67KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/curriculum-learning.png

158.38KB

examples/Baselines/GridDispatch_competition/torch/grid_agent.py

1.97KB

examples/CARLA_SAC/.benchmark/

examples/CARLA_SAC/.benchmark/Lane_bend.gif

3.19MB

examples/tutorials/parl2_dygraph/README.md

1.38KB

examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/quadrotor_agent.py

2.01KB

examples/tutorials/parl2_dygraph/lesson3/dqn/algorithm.py

2.86KB

examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/agent.py

2.79KB

examples/tutorials/lesson4/policy_gradient/algorithm.py

1.7KB

examples/tutorials/lesson4/policy_gradient/model.py

1.04KB

examples/tutorials/homework/lesson3/dqn_mountaincar/train.py

4.72KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2.py

7.22KB

examples/tutorials/lesson3/dqn/agent.py

3.89KB

examples/Baselines/Halite_competition/torch/rl_trainer/utils.py

7.64KB

examples/tutorials/homework/lesson3/dqn_mountaincar/agent.py

3.89KB

examples/tutorials/homework/lesson2/sarsa_frozenlake/

examples/tutorials/homework/lesson2/sarsa_frozenlake/gridworld.py

6.53KB

examples/tutorials/homework/lesson2/q_learning_frozenlake/gridworld.py

6.53KB

examples/DQN_variant/.benchmark/

examples/DQN_variant/.benchmark/Dueling DQN.png

218.21KB

examples/tutorials/lesson2/sarsa/

examples/tutorials/lesson2/sarsa/gridworld.py

6.53KB

examples/tutorials/requirements.txt

126B

examples/tutorials/lesson2/sarsa/train.py

2.95KB

examples/tutorials/lesson2/q_learning/gridworld.py

6.53KB

examples/tutorials/lesson5/ddpg/model.py

1.73KB

examples/SAC/mujoco_agent.py

1.83KB

examples/tutorials/lesson3/dqn/train.py

4.82KB

examples/IMPALA/requirements.txt

74B

examples/DQN_variant/requirements.txt

79B

examples/TD3/mujoco_model.py

2.54KB

examples/Baselines/Halite_competition/torch/test.py

1.44KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/args.py

3.51KB

examples/tutorials/parl2_dygraph/lesson4/

examples/tutorials/parl2_dygraph/lesson4/policy_gradient/

examples/tutorials/parl2_dygraph/lesson4/policy_gradient/agent.py

1.8KB

examples/tutorials/parl2_dygraph/lesson4/homework/

examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/

examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/agent.py

1.8KB

examples/tutorials/parl2_dygraph/lesson3/homework/dqn_mountaincar/train.py

4.67KB

examples/OAC/README.md

1.04KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/utils.py

13.97KB

examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/train.py

4.29KB

examples/tutorials/parl2_dygraph/lesson4/policy_gradient/algorithm.py

1.94KB

examples/tutorials/parl2_dygraph/lesson4/homework/policy_gradient_pong/model.py

1.35KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/utils.py

2.59KB

examples/QMIX/qmix_agent.py

5.35KB

examples/OAC/mujoco_model.py

2.55KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/README.md

700B

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/demo.gif

4.58MB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/fastest.png

270.84KB

examples/PPO/atari_model.py

2.03KB

examples/PPO/README.md

2.48KB

examples/Baselines/Halite_competition/paddle/requirements.txt

32B

examples/tutorials/lesson5/ddpg/algorithm.py

3.46KB

examples/tutorials/lesson5/ddpg/agent.py

2.67KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_pb2_grpc.py

1.93KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/simulator_client.py

4.25KB

examples/tutorials/homework/lesson2/sarsa_frozenlake/train.py

2.67KB

examples/QMIX/qmixer_model.py

3.06KB

examples/QMIX/train.py

6.51KB

examples/tutorials/lesson4/policy_gradient/train.py

3.66KB

examples/CQL/mujoco_model.py

2.78KB

examples/tutorials/parl2_dygraph/requirements.txt

130B

examples/SAC/README.md

1.24KB

examples/NeurIPS2019-Learn-to-Move-Challenge/train_args.py

2.73KB

examples/DQN_variant/README.md

2.65KB

examples/QMIX/README.md

1.31KB

examples/QMIX/requirements.txt

37B

examples/QMIX/env_wrapper.py

3.11KB

examples/QuickStart/train.py

3.83KB

examples/AlphaZero/MCTS.py

5.83KB

examples/tutorials/parl2_dygraph/lesson5/ddpg/train.py

4.21KB

examples/tutorials/lesson3/dqn/algorithm.py

3.02KB

examples/tutorials/parl2_dygraph/lesson5/homework/ddpg_quadrotor/train.py

6.06KB

examples/tutorials/parl2_dygraph/lesson5/ddpg/agent.py

2.31KB

examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_model.py

5.81KB

examples/tutorials/parl2_dygraph/lesson4/policy_gradient/model.py

1.26KB

examples/tutorials/parl2_dygraph/lesson4/policy_gradient/train.py

3.65KB

examples/tutorials/lesson2/q_learning/train.py

2.85KB

examples/NeurIPS2019-Learn-to-Move-Challenge/scripts/train_difficulty3.sh

416B

examples/tutorials/README.md

1.74KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track1/agent.py

19.08KB

examples/tutorials/homework/lesson2/sarsa_frozenlake/agent.py

2.77KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/track2/agent.py

12.96KB

examples/tutorials/lesson2/sarsa/agent.py

2.77KB

examples/PPO/train.py

5.99KB

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/

examples/NeurIPS2020-Learning-to-Run-a-Power-Network-Challenge/images/l2rpn.jpeg

69.44KB

examples/OAC/train.py

5.45KB

examples/Baselines/Halite_competition/paddle/model/

examples/Baselines/Halite_competition/paddle/model/latest_ship_model.pth

325.14KB

examples/AlphaZero/Arena.py

3.24KB

examples/QuickStart/performance.gif

237.51KB

examples/NeurIPS2019-Learn-to-Move-Challenge/image/

examples/NeurIPS2019-Learn-to-Move-Challenge/image/performance.gif

782.27KB

examples/CARLA_SAC/.benchmark/carla_sac.png

141.86KB

examples/A2C/.result/result_a2c_paddle1.png

203.23KB

examples/Baselines/Halite_competition/torch/model/

examples/Baselines/Halite_competition/torch/model/latest_ship_model.pth

338.03KB

examples/NeurIPS2018-AI-for-Prosthetics-Challenge/image/last course.png

360.06KB

examples/QMIX/images/

examples/QMIX/images/paddle2.0_qmix_result.png

97.1KB

examples/tutorials/parl2_dygraph/lesson5/ddpg/algorithm.py

3.69KB

examples/tutorials/parl2_dygraph/lesson5/ddpg/env.py

6.33KB

examples/CARLA_SAC/model.ckpt

4.63MB

examples/ES/mujoco_agent.py

2.77KB

examples/tutorials/parl2_dygraph/lesson5/ddpg/model.py

1.94KB

examples/ES/es_config.py

1.2KB

examples/ES/mujoco_model.py

1.93KB

资源内容介绍

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）内涵20+强化学习经典算法代码。对应使用教程什么的参考博客：多智能体（前沿算法+原理）https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502强化学习基础篇（单智能体算法）https://blog.csdn.net/sinat_39620217/category_10940146.html

# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge<img src="image/competition.png" alt="PARL" width="800"/>This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.For more technical details about our solution, we provide:1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.<img src="image/demo.gif" alt="PARL" width="500"/>## Dependencies- python3.6- [parl==1.0](https://github.com/PaddlePaddle/PARL)- [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle)- [osim-rl](https://github.com/stanfordnmbl/osim-rl)- [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html)- tqdm- tensorflow (To use tensorboard)## Part1: Final submitted model### ResultFor final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes ||----------------------------|---------------------------------|------------|-------------------|| 9968.5404 | 9980.3952 | 0.0026 | 5000 |### Test- How to Run 1. Enter the sub-folder `final_submit` 2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 3. Unpack the file by using: `tar zxvf saved_model.tar.gz` 4. Launch the test script: `python test.py`## Part2: Curriculum learning<img src="image/curriculum-learning.png" alt="PARL" width="500"/>#### 1. Target: Run as fast as possible<img src="image/fastest.png" alt="PARL" width="800"/>```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 # client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest```#### 2. Target: run at 3.0 m/s```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [RunFastest model]# client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \ --act_penalty_lowerbound 1.5 ```#### 3. target: walk at 2.0 m/s```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 3.0m/s model]# client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \ --act_penalty_lowerbound 0.75 ```#### 4. target: walk slowly at 1.25 m/s<img src="image/last course.png" alt="PARL" width="800"/>```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 2.0m/s model] # client (Suggest: 200+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \ --act_penalty_lowerbound 0.6```## Part3: Training in random velocity environment for round2 evaluationAs mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))```bash# serverpython simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head # client (Suggest: 100+ clients)python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \ --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3```### Test trained model```bashpython test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]```### Other implementation details<img src="image/velocity_distribution.png" alt="PARL" width="800"/>Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)## AcknowledgmentsWe would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.

用户评论 (0)

发表评论

您的昵称:

评分

评论内容:

验证码:

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）

资源文件列表(大概)

资源内容介绍

用户评论 (0)

发表评论

相关资源

Java基础教程

天语E500_V0820_20100820刷机包1

数字逻辑-交通灯系统设计(HUST) 1-12关头歌

web前端 html+css+js+jquery 网易云音乐官网模仿

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）

资源文件列表(大概)

资源内容介绍

用户评论 (0)

发表评论

相关资源

Java基础教程

天语E500_V0820_20100820刷机包1

数字逻辑-交通灯系统设计(HUST) 1-12关 头歌

web前端 html+css+js+jquery 网易云音乐官网模仿

数字逻辑-交通灯系统设计(HUST) 1-12关头歌