Gorilla Academy: Learning the value of information in an uncertain world

This is a classic n-armed bandit task. Participants choose between two slot machines (or bandits) with the aim of winning as many points as possible over the task. Each bandit has a different probability of paying out and different number of points on offer. Participants will learn by trial-and-error whether a bandit is:

  • High risk High reward
  • High risk Low reward
  • Low risk High reward
  • Low risk Low reward

For half of the experiment the environment is stable (i.e. 60 trials with a fixed 75:25 reinforcement schedule), in the other half the environment is volatile (i.e. a 80:20 reinforcement schedule that switches every 20 trials).

This task uses spreadsheets to assign magnitude and probability values on each trial. Different spreadsheets decide which version the participant does, i.e:

  • Stable first with green paying out more (StableGreen)
  • Stable first with blue paying out more (StableBlue)
  • Volatile first with green paying out more (VolatileGreen)
  • Volatile first with blue paying out more (VolatileBlue).

A simple bit of scripting is used to decide if participants win or lose on a specific trial, and to add winning points to the total score.

If you want to see how I created this experiment, organised and analysed the data, you can watch my video tutorials on Gorilla Academy in the Learning section.

This study is a replication of Behrens et al (2007)

Back to Open Materials


Learning Experiment

Creative Commons Attribution (CC BY)


Generic Consent

Creative Commons Attribution (CC BY)


Demographic Questions

Creative Commons Attribution (CC BY)


Learning task

Creative Commons Attribution (CC BY)


State–Trait Inventory for Cognitive and Somatic Anxiety (STICSA) - Trait

Creative Commons Attribution (CC BY)


Grös et al (2007)
https://doi.apa.org/doi/10.1037/1040-3590.19.4.369

Preferred Citation Behrens et al (2007) - task
https://doi.org/10.1038/nn1954
Published on 27 November 2020