Theoretical background:
In TAM, reinforcement learning is used as a way of learning expectations of future reinforcement associated with state/action pairs, e.g., the expectation associated with turning left at the T-junction is increase if the rat finds food at the end of the left arm. The reinforcement learning system tries to influence the behavior of the rat to maximize the sum of reinforcement that will be received over time. The reinforcement learning rule was implemented in the present model by the use of temporal differences in an actor-critic architecture (Barto et al., 1983; Barto, 1994; Sutton, 1988).
For more information on reinforcement learning, please refer to the cited papers or to the papers listed in the Hippocampus and Navigation Group homepage.
Representing rewardness expectation ...
In TAM-WG, reinforcement is coded as a bump of activity over a linear array of cells. The figures shown below represent the mean of the reinforcement activity for the left and the right turns on a T-junction for a fornix-lesioned animal (O'Keefe ,1983).
After criterion was reached on turns to a specific arm of the T-maze, probe trials with an 8-arm radial maze were interspersed with the usual T-trials to assess the position of the orientation vector at that stage of learning. Fornix-lesioned animals show a steady, incremental shift in the direction of the orientation vector from the original quadrant (left, if trained first with food in the left arm) through straight ahead and into the new reversal quadrant (right, in our example).
If observed only in the T-maze, the shift from the incorrect to the correct arm of the T-maze is relatively abrupt. However, the orientation bias gradually swings from the direction of the originally rewarded turn through straight ahead and over the reversed turn when behavior is observed in the eight-arm radial maze.
From the relatively smooth changing of the orientation vector shown by the fornix-lesioned animals, one might expect that, in their case, only the cells close to the preferred behavioral direction are excited, and that learning "marches" the peak shown in the figures below, called "rewardness expectation" in the present work, from the old to the new preferred direction. However, the peak in our model does not actually march. Rather it is the mean of 2 peaks that moves. During reversal of the T-maze, the reinforcement learning rule, will "unlearn" -90 degrees (turn to the left in the T-junction), by reducing the peak there, while at the same time "building a new peak at the new direction of +90 degrees (turn to the right in the T-junction).
University
of Southern California Brain Simulation Lab
All contents copyright (C) 1994-1997. University of Southern California Brain Simulation Lab All rights reserved.Author: Alex Guazzelli <aguazzel@rana.usc.edu>