Gorilla Academy: Learning the value of information in an uncertain world

This is a classic n-armed bandit task. Participants choose between two slot machines (or bandits) with the aim of winning as many points as possible over the task. Each bandit has a different probability of paying out and different number of points on offer. Participants will learn by trial-and-error whether a bandit is:

High risk High reward
High risk Low reward
Low risk High reward
Low risk Low reward

For half of the experiment the environment is stable (i.e. 60 trials with a fixed 75:25 reinforcement schedule), in the other half the environment is volatile (i.e. a 80:20 reinforcement schedule that switches every 20 trials).

This task uses spreadsheets to assign magnitude and probability values on each trial. Different spreadsheets decide which version the participant does, i.e:

Stable first with green paying out more (StableGreen)
Stable first with blue paying out more (StableBlue)
Volatile first with green paying out more (VolatileGreen)
Volatile first with blue paying out more (VolatileBlue).

A simple bit of scripting is used to decide if participants win or lose on a specific trial, and to add winning points to the total score.

If you want to see how I created this experiment, organised and analysed the data, you can watch my video tutorials on Gorilla Academy in the Learning section.

This study is a replication of Behrens et al (2007)

Back to Open Materials