Theoretical background:
In TAM, reinforcement learning is used as a way of learning expectations of future reinforcement associated with state/action pairs, e.g., the expectation associated with turning left at the T-junction is increase if the rat finds food at the end of the left arm. The reinforcement learning system tries to influence the behavior of the rat to maximize the sum of reinforcement that will be received over time. The reinforcement learning rule was implemented in the present model by the use of temporal differences in an actor-critic architecture (Barto et al., 1983; Barto, 1994; Sutton, 1988).
For more information on reinforcement learning, please refer to the cited papers or to the papers listed in the Hippocampus and Navigation Group homepage.
Representing reinforcement ...
In TAM-WG, reinforcement is coded as a bump of activity over a linear array of cells. The figures shown below represent expectations of future reward for the left and the right turns on a T-junction for a fornix-lesioned animal (O'Keefe, 1983).
After criterion was reached on turns to a specific arm of the T-maze, probe trials with an 8-arm radial maze were interspersed with the usual T-trials to assess the position of the orientation vector at that stage of learning. Fornix-lesioned animals show a steady, incremental shift in the direction of the orientation vector from the original quadrant (left, if trained first with food in the left arm) through straight ahead and into the new reversal quadrant (right, in our example).
From the relatively smooth changing of the orientation vector shown by the fornix-lesioned animals, one might expect that, in their case, only the cells close to the preferred behavioral direction are excited, and that learning "marches" this peak, called "rewardness expectation" in the present work, from the old to the new preferred direction. However, the peak in our model does not actually march. Rather it is the mean of 2 peaks (shown below) that moves. During reversal of the T-maze, the reinforcement learning rule, will "unlearn" -90 degrees (turn to the left in the T-junction), by reducing the peak there, while at the same time "building a new peak at the new direction of +90 degrees (turn to the right in the T-junction).
University
of Southern California Brain Simulation Lab
All contents copyright (C) 1994-1997. University of Southern California Brain Simulation Lab All rights reserved.Author: Alex Guazzelli <aguazzel@rana.usc.edu>