Obstacles Pose A Challenge

Waking up to check the results of last night's training session brought mixed feelings. The agent, which had previously mastered navigating a simple platform with a goal, seemed lost when obstacles were introduced into the environment, but what went wrong?

Our agent failing miserably

In our initial setup, the platform featured a straightforward goal-agent dynamic. The agent excelled, effortlessly progressing through levels up to 10 without a hitch. However, adding randomly scattered obstacles to this environment proved challenging. Overnight training sessions resulted in an agent that, instead of navigating around obstacles to reach the goal, seemed to loop aimlessly, occasionally colliding with obstacles or falling off the map. Clearly, there was a disconnect between the agent's training and its ability to adapt to this new addition.

Recognizing the need for a more sophisticated approach, we're diving deeper. Fear not, we have a few tricks up our sleeve.

We're considering several options:

  • Imitation Learning: By providing the agent with training data based on our gameplay, we aim to guide it towards more effective strategies.

  • Generative Adversarial Imitation Learning (GAIL): Introducing competitive elements into the training process could help the agent learn from an opponent's tactics.

  • Adjusting Hyperparameters: Fine-tuning variables such as learning rates, network complexity, and even exploring different algorithms other than Proximal Policy Optimization (PPO).

  • Expanding Inputs: Augmenting the agent's sensory inputs to include precise details such as the exact position of the goal as well as each obstacle’s position.

Since, our platform dynamically generates varying numbers of obstacles, presenting a unique challenge each time. To accommodate this variability, we're implementing a buffer sensor. This component manages an array of inputs, allowing the agent to track multiple obstacle positions.

The buffer sensor employs attention mechanisms to prioritize the array's most important data and ignore what is deemed unimportant. This adaptive input capability is perfect for our environment where obstacle counts fluctuate.

As we roll up our sleeves and refine the agent's behavior—its policy—we remain committed to experimenting with various techniques. Our goal is clear: to be good developers and equip our agents with everything it needs to thrive in our changing environments.

Previous
Previous

Hello Egg-Bot!

Next
Next

Evaluating The Results