Abstracts of 1999 Conference Papers

Listed below are the SQAB abstracts for Presentations, Poster Presentations, and Preeminent Tutorials.



SQAB Presentations




George Ainslie, MD
Veterans Administration, Coatesville, PA
ainslie@coatesville.va.gov

An Intertemporal Discounting Model of Willpower

Students of behavior from the classical Greeks to modern behaviorists like Heyman and Rachlin have noted that choice becomes less impulsive if made categorically. The hyperbolic discounting described by the matching law predicts both the impulsiveness of single choices and reduced impulsiveness as an equilibrium in recurrent intertemporal bargaining, which could be based on a categorical rule to cooperate in particular circumstances. Furthermore, it predicts that this equilibrium will represent not a simple optimum, but rather a brittle compromise that hampers choice in ways that coincide with common psychiatric symptoms. However, the actual occurrence of intertemporal bargaining has been hard to study in nature. Preliminary results from four possible methods will be described: 1) The predicted properties of the equilibrium pattern can be compared with "common knowledge" about willpower. 2) The properties of the hypothesized equilibrium pattern can be explored by thought experiments on intentionality, like Kavka's problem. 3) Intertemporal bargaining can be partially modeled by interpersonal bargaining. 4) The combining properties of series of rewards can be studied parametrically.




William M. Baum, Jed W. Schwendiman, & Kenneth E. Bell
University of New Hampshire
wm.baum@unh.edu

A Simple View of Choice

The matching law was an accident. Nothing so complex goes on in concurrent performance, and we should expect nothing so complex, because natural selection favors simple patterns ("rules of thumb"). In the present experiment, pigeons were exposed to pairs of concurrent variable-interval schedules, with reinforcement ratios that ranged more widely than usual (up to 532 to 1). Large samples of stable performance were gathered. The usual result of matching with a sensitivity of 0.8 was obtained, but analysis of visits revealed that visits to the non-preferred alternative were brief and approximately constant. Systematic deviation from the generalized matching line was predicted and confirmed. Matching, undermatching, and overmatching may all be explained by a view of concurrent performance based on foraging theory, in which responding occurs primarily at the rich alternative, occasionally interrupted by brief visits to the lean alternative.




Jose E. Burgos
Universidad Católica Andrés Bello
Universidad Central de Venezuela
jburgos@ucab.edu.ve

Cooperation as an Emergent Property of Selection by Reinforcement in Artificial Neural Networks

Pairs of artificial neural networks with identical architectures were given a functional analogue of the Prisoners' Dilemma. Processing elements in both networks functioned according to the Donahoe-Burgos-Palmer model. Contingencies of reinforcement were determined by a payoff matrix that satisfied the formal restrictions of the game. Payoffs for cooperating (i.e., for responding at the end of a fixed interval) or defecting were defined in terms of reinforcement magnitude. A network had no knowledge of another network's existence or influence on the contingencies, nor any a priori knowledge of the payoff matrix. Results show that cooperation was the most frequent behavior and that reciprocity (tit-for-tat) was the predominant strategy.




Colin F. Camerer
Humanities and Social Sciences, California Institute of Technology
camerer@hss.caltech.edu

Experience-weighted Attraction Learning in Games and Decisions

Economists who study strategic behavior have focussed attention, for nearly fifty years, on "equilibrium" models in which players choose strategies which are best responses to strategy choices by others. The question of how such an equilibrium might come about has been "put on hold" until the last ten years or so. Now attention is focussed on models in which equilibration occurs by population evolution (e.g., replicator dynamics) or by adaptive learning. Previously, learning models were thought to fall roughly into two separate categories-- reinforcement (which SQABers are familiar with) and weighted fictitious play. In weighted fictitious play, players take an arithmetic average of what *others* have done in the past, perhaps weighting more recent observations more heavily, to make a guess about what other players will do in the future; then they choose a strategy which has a high payoff given that guess.

We show that these two types of models are actually closely related. Specifically, both are special cases of a general model, called "experience weighted attraction (EWA) learning". In EWA learning, players choose strategies as they play a game repeatedly, and change the (unobserved) attraction of a strategy in response to feedback. Attractions for strategy j in period t, A^j(t), are equal to a lagged attraction, multiplied by a decay weight and an experience weight N(t), and are then reinforced by either the payoff received (for a chosen strategy) or a fraction delta times the foregone payoff (for an unchosen strategy). The updated attraction is then normalized by an updated experience weight. When delta=0 (and some other assumptions hold) themodel reduces to a simple kind of choice reinforcement. When delta=1 (and some other assumptions hold), quite remarkably, the attractions are exactly the same as expected payoffs given weighted fictitious play learning. Thus, reinforcement and fictitious play learning are closely linked; they vary (almost) only in the degree of weight given to foregone payoffs (expressed by delta).

Using data from human subjects playing simple strategic games repeatedly (from 10 to 100 periods), we estimate the value of the parameters, and forecast "out of sample" the last 30% of the periods in each experiment, for 8 different games (a total of more than 10,000 observations). The EWA model forecasts more accurately than simpler models in all but one game. New extensions of the model allow some fraction of players to "sophisticatedly" anticipate that others are learning (and respond accordingly), which is closely related to other concepts in game theory (quantal response equilibrium). We also extend the model to cases where players do not know, at the start, what the foregone payoffs are to strategies they did not choose.

EWA is described in a chapter by Camerer and Ho in the book Games and Human Behavior, edited by D. Budescu, I. Erev, and R. Zwick (1996), in an article in the Journal of Mathematical Psychology, 1998, and (most fully) in an article forthcoming in Econometrica, April 1999.





Luc-Alain Giraldeau
Department of Biology, Concordia University, Montréal, CANADA
giraluc@vax2.concordia.ca

Learning to Exploit Others Optimally: Reaching Solutions to a Producer-Scrounger Foraging Game

A hungry solitary animal must invest effort in searching for its food but a group foraging animal doesn't. Instead it can just wait until a companion finds food and then join it to obtain a share. So the question arises how much waiting as opposed to searching should individual group members engage in? In order to figure out the best waiting policy one needs to consider the economic outcome of the different possible foraging policies. Because the pay-off to the waiting option is negatively frequency-dependent, a game theoretic model is needed. This game is called the PRODUCER-SCROUNGER or PS game and seems to apply to small, ground-feeding, granivorous birds. The stable Nash solution to the game whose players are designed to maximize their individual food intake is a frequency of waiting within the group that fails to maximize the group members' intake rate. I present the elements and predictions of an intake-maximizing PS game, some qualitative evidence of its applicability to flocks of spice finches (Lonchura punctulata) and show that individual birds are actually changing their investments in waiting (playing scrounger) and searching (playing producer) until they reach the predicted stable equilibrium point where no further change can be achieved through learning since both options provide equal rewards.





Gene M. Heyman
McLean Hospital & Harvard Medical School
gmh@wjh.harvard.edu

Framing and Rationality in Choice Experiments

The study of choice in animal experiments is no longer alive with passionate argument, counter-intuitive demonstrations of matching, and clever demonstrations of reward maximizing. However, an obituary is not in order. The structure of the competing experiments reveals a simple and productive synthesis, based on the idea that matching and maximizing are a function of how the options (contingencies) are framed. The basic premise of this approach is that any given environment provides several potential reward contingencies, and that, therefore, a description of choice requires a description of the factors that determine the operative contingency. This idea helps explain irrational laboratory phenomena as well as irrational non-laboratory phenomena, such as addiction.





Steven R. Hursh
Johns Hopkins University School of Medicine and SAIC
STEVEN.R.HURSH@cpmx.saic.com

Wednesday Breakfast Tutorial: Analyzing Demand Curves using GraphPad Prism

A non-linear demand curve equation (Hursh, et al., 1989; Hursh, 1991) has often been used to fit behavioral economic data describing the relationship between price and consumption. The parameters of the demand curve may be used to quantify the changes in elasticity that occur with increases in price and permit the calculation of Pmax, the price producing the highest level of responding (expenditures). Non-linear curve fitting with logarithmic data is not as familiar to many as simple linear curve fitting with arithmetic data. This workshop introduces the basic principles of non-linear curve fitting and describes the use of a software package (GraphPad Prism) to fit the demand equation to consumption data. Simple extensions of the same methods can be used to fit response output data and normalized demand curves. The workshop will also show how Prism can be used to produce publication quality graphics. If sufficient interest is expressed, the distributor of Prism will post the demand and response output equations on their web site.





Stephen E. G. Lea
University of Exeter, UK
S.E.G.Lea@exeter.ac.uk

Thursday Breakfast Tutorial: Where's the Behaviour in Behavioural Economics?

The idea of this tutorial is to have an exchange of views on the role of behavioural psychology in explaining economic behaviour. Much of the input into "behavioural economics" comes from economists; psychologists trying to address similar areas often call themselves "economic psychologists", and usually have a basically social psychological training. There's absolutely nothing wrong with either an economics or a social psychology orientation, but behavioural psychologists are almost certainly going to feel that something important gets left out by both. In the first day of the conference, I'm going to be trying to monitor this issue as the talks go by, and I aim to introduce the tutorial by presenting the thoughts I've had as a result. What I am hoping is that others will be doing something similar - or will have other thoughts on the issue - so we don't end up with a jetlagged monologue. So please bring your ideas and reactions to the first day's papers so we can have a genuinely interactive session.





Stephen E. G. Lea, Anke K. Unrath & Paul Webley
University of Exeter, UK
S.E.G.Lea@exeter.ac.uk

Time Preference as a Disposition: Relative Impatience and the Economic Self

Preferences between outcomes that are delivered at different distances from the present represent one of the greatest challenges to quantitative, rational theories of human and animal choice. Such choices are also of fundamental practical importance in everyday life. For over a hundred years, economists and other social scientists have recognized that people save less, and incur more debt, than any plausible rationality analysis predicts. Economists have frequently resorted to ill-defined, ad hoc psychological constructs such as "impatience" or "subjective discount rate" in order to reconcile the facts of common experience with core economic theory. Psychologists similarly have used a range of constructs to account for the existence of preference for immediate reward, and some reliable differences in such preference between groups of subjects. Few, however, have asked whether relative preference for immediacy is a stable dispositional characteristic of individuals. Unless it is, it is unlikely that short-term psychological experiments can shed much light on the inter-temporal choices that shape people's economic lives. This paper will gather together research designed to test the cross-situational stability of individual differences in time preference, and will place it into a the conceptual framework of the "economic self", the individual's own perception of himself or herself as a consistent economic agent.





James E. Mazur
Southern Connecticut State University
mazur@scsu.ctstateu.edu

Complex Choice: Comparing Models of Concurrent-chain Performance

Past experiments using concurrent-chain schedules provide a body of data that can be used to compare and evaluate competing mathematical models of choice. Grace's (1994) analysis of the data from 19 published studies on concurrent-chain performance was re-examined. The predictive accuracy of Grace's Contextual Choice Model (CCM) was compared to that of other models, including delay-reduction theory (DRT) and a new theory call the hyperbolic value-added model (HVA). When given the same number of free parameters, CCM and HVA accounted for roughly the same percentages of variance in the 19 studies, and DRT accounted for just slightly less. The common features of these three models suggest a general framework that any successful model of concurrent-chain choice may need to include. Preliminary data from research designed to distinguish among the predictions of the different theories will be presented.





Suzanne H. Mitchell
University of New Hampshire
shm@hopper.unh.edu

Cigarette Smoking and Self-control: Using Discount Functions to Examine Drug Use

There is a small but growing literature that examines drug use using the impulsivity/self-control framework developed in behavior analysis. Under this framework, drug use is viewed as a relatively impulsive behavior because, at the simplest level, users choose the immediate, certain drug reinforcer over a probably healthier, more productive life in the future. Two applications of the framework to drug use have been attempted. One involves demonstrating that drug users are chronically more impulsive than nonusers. The other involves demonstrating that acute changes in drug exposure will increase impulsivity. Discount functions describing cigarette smokers' choices between immediate-delayed and certain-uncertain reinforcers in a series of studies will be discussed in the context of these two applications.





John A. Nevin
tnevin@postoffice.worldnet.att.net

President's Introduction:
Convergent Predictions by Different Models

Utterly different quantitative models sometimes predict very similar behavioral relations. Some examples from research on stimulus control are presented, and some implications for our model-building behavior are considered.





Drazen Prelec & Ronit Bodner
Sloan School of Management, Massachusetts Institute of Technology
dprelec@MIT.edu

Learning From One's Own Actions in a Self-signaling Model

I review the basic notion of self-signaling behavior (Bodner and Prelec) and then derive some implications for learning. Our first assumption is that individuals derive utility from the diagnostic implications of their choices - what the choices imply about their preferences, abilities, dispositions - even when the choices have no causal impact on these unobserved internal characteristics. For instance, a person might avoid minor extravagances so as not to call into question their committment to some longterm financial goals. Our second assumption is that the inferences drawn from any given choice are themselves rational, in the formal sense of being consistent with utility maximization and Bayesian updating of beliefs. The behavior of a person motivated by self-signaling is qualitatively different from that of a standard utility maximizer in a number of ways. First, a self-signaling person may benefit by not being able to choose, or, alternatively, may seek a clarifying choice, to confirm the possession of a desirable trait. Second, a self-signaling person is susceptible to "moral placebo effects," in that a change in beliefs about his or her traits affect choices even though the new beliefs leave actual preferences unchanged. Because one's past choices are an important source of evidence about one's traits, these choices can become binding precedents even when the rationale for doing the same thing no longer applies.

THE DIAGNOSTIC VALUE OF ACTIONS IN A SELF-SIGNALING MODEL





Howard Rachlin
State University of New York at Stony Brook
hrachlin@psych1.psy.sunysb.edu

Self-Control And Social Cooperation

There is an analogy between self-control and social cooperation in an n-person prisoner's dilemma game. An individual may be seen as cooperating with or defecting from his or her own future interests in self-control just as an individual may cooperate or defect from the interests of a social group. A factor distinguishing the two spheres is probability of reciprocation -- usually high within an individual over time and low between individuals in social space. Experiments within and between individuals demonstrate that probability of reciprocation is the crucial variable determining whether self-control and social cooperation will be obtained.





Alvin E. Roth & Ido Erev
Department of Economics, Harvard University
al_roth@harvard.edu

Learning in Strategic Environments: Approximation and Prediction

We consider how simple models of reinforcement learning, and related adaptive models, can be used to predict behavior in repeated play of strategic games. Extremely simple models of reinforcement learning over actions turn out to have substantial power in predicting the play of randomly generated 2x2 zero sum games. Learning over more complex strategies is needed to allow simple adaptive models to predict behavior in repeated play of non-zero sum games. We also discuss how the search for robust approximations with predictive power is different from simple hypothesis testing.

Al Roth's game theory and experimental economics page





Reinhard Selten
Department of Economics, University of Bonn, Germany
selten@lab.econ1.uni-bonn.de

Learning Direction Theory

Learning direction theory applies to repeated decisions on the same parameter under feedback conditions permitting conclusions on whether a higher or lower value would have been better last time. It is predicted, that more often than randomly expected, the parameter is changed in the indicated direction, if it is changed at all. The theory yields only weak qualitative predictions. However, they are confirmed in at least twelve studies in a wide variety of situations.

The talk will present one application in more detail, namely two experiments by Selten, Abbink, and Cox on the "winner's curse" observed in some auction situations: Bidders lose money by making bids which on the average are higher than the value of the object bought. For most subjects the phenomenon survives long sequences of trials without any tendency towards the optimal bid. Learning direction theory provides an explanation for this.

With the help of simple assumptions on the strength of positive and negative influences of feedback experiences an "impulse balance point" can be computed. One obtains quantitative predictions in rough agreement with the data.





Dave Stephens
Ecology, Evolution & Behavior, University of Minnesota
dws@forager.cbs.umn.edu

Self-control and Game-theoretical Models of Cooperation and Altruism

Evolutionary biologists have devoted much effort to explaining the "niceness" of animals. I briefly review models of altruism and cooperation, focusing on models of altruistic cooperation maintained by reciprocity. In these models a cooperator foregoes a short-term temptation to cheat in order to achieve larger gains in the long-run. Biological modeler have focused on the role of repeated interaction in making the "long-run" longer and more valuable, and hence promoting cooperative action. I will review evidence bearing on these models and conclude that the empirical picture is quite discouraging. Animals typically 'cheat' when there is a short-term advantage in doing so, even when there is massive repetition. The reason for these empirical failings, I argue, is that while repetition is necessary, repetition on its own gives a very incomplete picture of how animals value future benefits---i.e. biological theorists have erred in ignoring other aspects of temporal discounting. I discuss experimental work from my laboratory that is intended to alert game theorists to the power of discounting phenomena. I also discuss how strong discounting may limit cooperative action, and mechanisms by which these limitations might be overcome.






 
 
SQAB Poster Presentations




Carlos F. Aparicio
Universidad de Guadalajara
aparicio@udgserv.cencar.udg.mx

The Barrier Choice Paradigm: Recent Results

This poster presents recent results obtained with the barrier choice paradigm. The experimental situation included eight response levers organized in pairs and placed in a space (1.5 m2) with the structure of a cross. To vary the rate of reinforcement in the levers, a concurrent schedule with eight Random-Interval components was used. In some conditions free access to levers was permitted. In other conditions, access to levers was blocked by placing barriers, 15 or 30 cm high, at the entrance of the alleys; to reach a lever, and to switch from one lever to another, rats thus had to climb the barriers. To analyze results, the generalized matching law was modified as follows: [log B1 - 1/8 (log B1 + log B2 +...log B8 )] = [ log r1 - 1/8 ( log r1 + log r2 + .... log r8)], where Bi and Ri (i = 1, .., 8) are response and reinforcement counts, respectively. Undermatching was found for conditions that permitted free access to all levers, the slope of the function being less than 1.0. However, when climbing the barriers was required to travel among levers, rats maximized the overall rate of reinforcement, the slope of the generalized matching law being higher than 2.0. Also, the residence times for the levers increased with increasing travel requirements.




Forest Baker & Howard Rachlin
State University of New York at Stony Brook
fbaker@psych1.psy.sunysb.edu hrachlin@psych1.psy.sunysb.edu

Preference for Random-Ratio Schedules of Reinforcement Depends on How They are Terminated

This study demonstrates that altering how a schedule of reinforcement is terminated affects its attractiveness. During a Random-Ratio (RR) schedule of reinforcement each response has a set probability of producing a reward; numerous studies have shown that organisms strongly prefer variable (including RR) schedules of reinforcement over fixed schedules of reinforcement. In these studies it is common practice to terminate the schedule of reinforcement after the presentation of a reward. However, the availability of an activity that is reinforced according to an RR schedule of reinforcement can be contingent on other factors such as number of responses. Using Mazur's (1984) adjusting procedure, results showed that pigeons preferred RR schedules of reinforcement that terminated after a set number of responses to RR schedules that terminated after the presentation of a reward.



Terry W. Belke
Mount Allison University
tbelke@mta.ca

Response Rate Asymptotes from Herrnstein's (1970) Response-Strength Equation Vary as a Function of Schedule Order

Six male Wistar rats were exposed to different orders of reinforcement schedules within the same session to determine if estimates of response rate asymptotes from Herrnstein's (1970) single operant matching law equation varied systematically with schedule order. Reinforcement schedules were arranged in orders of increasing and decreasing reinforcement rate. Subsequently all rats were exposed to a single reinforcement schedule within a session to determine within-session changes in responding. For each condition, the operant was lever pressing and the reinforcing consequence was the opportunity to run for 15 s. Results showed that estimates of response rate asymptotes were higher when reinforcement schedules were arranged in order of increasing reinforcement rate and that within a session on a single reinforcement schedule, response rates increased between the beginning and the end of a session. A positive correlation between the difference in response rates within a session and the difference in asymptotes between schedule orders suggests that the within-session change in response rates may play a role in the difference in the asymptotes. This pattern of results is discussed in terms of changes in reinforcer efficacy within a session.



Lee Bloomquist
Senior Researcher Steelcase Inc.
lbloomquist@mcimail.com

The Ultimatum Game Played in Two Different Situations

Researchers in experimental economics may have overlooked a relevant variable that, if controlled, would produce more information from their experiments: the physical situation. This is an example using The Ultimatum Game, which has been well studied in the field of experimental economics. We've run the game while controlling the physical situation. In one situation-- a type in environmental psychology that's associated with friendly, cooperative interactions-- we ran the game 30 times with 30 different pairs of players. In another situation-- a type that's associated with formal, competitive interactions-- we ran it with another 30 pairs. The physical situation plausibly skews the game. The increased information suggests an application: applying the former type of environment may help to increase trust, as well as reduce anxiety about misperceived inequity in the workplace.



A. Charles Catania, Eliot Shimoff & Lara Kowalsky
University of Maryland Baltimore County
catania@umbc.edu shimoff@umbc.edu lkowalsky@loyola.edu

Delay of Reinforcement and Reinforcer Duration

The delay function for pigeons' key pecking relates response rates to the temporal separation of pecks from reinforcer deliveries. It can be assessed by reinforcing sequences of responses on two keys: if reinforcers are produced by exactly m left-key pecks followed by n right-key pecks, each reinforcer is separated from the most recent left-key peck by the time taken to complete the right-key pecks (previous studies obtained delay functions using random-interval reinforcement of such two-key sequences). Furthermore, each reinforcer acts retroactively on prior responding only as far back as the prior reinforcer. In other words, it blocks the responses it follows from the effects of reinforcers that come later. If a long reinforcer is regarded as a sequence of two short reinforcers, with the later one blocking effects of the earlier, it follows that effects of delayed reinforcers should be independent of reinforcer duration. Consistent with this conclusion, pigeon data from two-key procedures like those in prior studies showed negligible effects of reinforcer duration on left-key response rates over a range of reinforcer durations from 2 to 6 seconds.



Bryce S. Cleland, T. Mary Foster, & William Temple
University of Waikato, New Zealand
BSC1@Waikato.ac.nz (cleland) psyco182@waikato.ac.nz (foster)

A Within-Subject Method To Derive Resurgence Functions

Resurgence is the term used to describe the occurrence of previously reinforced behaviors during the extinction of another behavior. Most research investigating the variables of which resurgence is a function have used accross group comparisions. Investigations of the controlling variables of resurgence would be more convincing accross individual subjects behavior. Problems may arise, however, with using repeated conditions or multiple schedule designs. It is possible that combining these designs may ameliorate some potential problems with the individual designs. Such a repeated conditions with multiple components design is presented along with some of the data from such a design.



Kent Conover & Peter Shizgal
Centre for Studies in Behavioral Neurobiology,
Concordia University, Montreal, Quebec, Canada.
Conover@CSBN.Concordia.Ca Shizgal@CSBN.Concordia.Ca

Employing Labor Supply Theory to Scale the Reward Value of Brain Stimulation.

We have applied labor-supply theory to describe how performance for rewarding electrical brain stimulation grows as a function of stimulation strength. A lever was armed at random intervals to deliver stimulation trains. Rats were rewarded only if they were holding the lever down at the moment it became armed. Stimulation strength was varied by altering the number of pulses per 0.5 sec train, and the average hold time required to earn a reward was varied by changing the mean of the programmed inter-reinforcement intervals. Labor supply theory predicts that the time the subject spends away from the lever ("leisure") per stimulation train should increase as a power function of the hold time required to earn a reward ("price"), assuming that the elasticity of substitution between brain stimulation and leisure is constant. The exponent of the power function (E) is determined by the elasticity of substitution. Thus, the value of the stimulation train can be derived from the leisure time per train and the price. Over several orders of magnitude, the results conformed closely to the predicted power relationship between leisure time per train and price. The value of brain stimulation was found to grow as a sigmoidal function of the number of pulses in the train. This scaling method promises to provide a powerful framework for characterizing the role of neurotransmitters, neural structures, and physiological signals in goal evaluation and behavioral allocation.



Adam H. Doughty, Jerry B. Richards & Kennon A. Lattal
West Virginia University
adoughty@wvu.edu

Effects of reinforcer magnitude on DRL-responding in rats and pigeons

The reinforcing strength of a stimulus may be characterized as a function of its rate, delay and/or magnitude. It follows then that a larger reinforcer magnitude should produce a higher probability of 'schedule-appropriate' responding than a smaller reinforcer magnitude. This interpretation of reinforcer strength was examined by attempting to produce efficient differential-reinforcement-of-low-rate (DRL) schedule responding in both rats and pigeons by increasing the magnitude of the reinforcer. Eleven rats responded under two DRL schedules (18 and 72 s) with two reinforcer magnitudes (30 and 300ul of water) and three pigeons responded under a DRL 20-s schedule with reinforcer magnitudes of 2 and 6-s access to grain. Response rates were increased when the reinforcer was larger, resulting in lower reinforcement rates. Quantitative analyses of the resulting interresponse-time (IRT) distributions demonstrated that the primary effect of the larger reinforcer magnitude was to shift the 'peak' of the distributions towards shorter IRT values without altering the size of the peak of the IRT distribution. Therefore, better 'schedule-appropriate' responding is maintained under DRL schedules of reinforcement with smaller reinforcer magnitudes than larger reinforcer magnitudes.



Leonard Green & Eric Macaux
Washington University
lgreen@artsci.wustl.edu

Discounting and the 'Pigeon's Dilemma'

Four pigeons played a prisoner's dilemma-type game for food reinforcers against a computer opponent programmed to play a tit-for-tat strategy. In different conditions, the pigeon's choice to defect or to cooperate was followed by a 3-sec or 37-sec delay before reward delivery. In the 3-sec delay condition, all subjects defected. This result replicated the findings of a previous study (Green, Price, & Hamburger, 1995). When the delay between the pigeon's choice and receipt of the reinforcer was 37 sec, however, all subjects exhibited a dramatic increase in their proportion of cooperative choices and earned more than double the number of food reinforcers obtained in the 3-sec delay condition. Although previous experimental research failed to obtain cooperative responding, the present findings support the hypothesis that pigeons' defection in iterated prisoner's dilemma games results from steep discounting of delayed rewards. Defection and cooperation may be accounted for in terms of discounting, which also underlies self-control, impulsivity, and preference reversals.



Alicia Grunow & Allen Neuringer
Reed College
Allen.Neuringer@directory.Reed.EDU

Comparing the Effects of Reinforcement Frequency and Reinforcement Contingency on Behavioral Variability

The present research describes the relationship between two important sources of behavioral variability, namely reinforcement frequency and reinforcement contingency. Previous research shows that response variability increases as reinforcement frequency decreases. The most notable case is the high variability generated by extinction, where reinforcement frequency is zero. Previous research also shows that levels of response variability are precisely controlled by reinforcement contingent upon the variability. What are the relative effects of intermittency and contingency, and how do they interact?

Forty rats were divided into 4 groups, all of which were reinforced for producing variable 3-response sequences across 2 levers and a key (variable sequences being those that were emitted infrequently in the past). The groups differed in the level of variability necessary to receive reinforcement, i.e., they differed in terms of the variability contingency. All subjects were initially reinforced every time they met their contingency, or CRF. Subsequently the subjects were reinforced on a VI 3 minute and then a VI 5 minute schedule for meeting their contingency. Both contingency and intermittency affected levels of variability, but contingency exerted a much larger influence than did intermittency. It appears, therefore, that reinforcement contingency controls levels of response variability to a greater extent than does reinforcement frequency. This result may be important for applications to educational and other learning situations.



Lauren Kettle & Peter Killeen
Arizona State University

Memory= Weight * Behavior



John R. Kraft
University of New Hampshire
jrk@christa.unh.edu

The Ideal Free Distribution of Group Choice with Shared and Probabilistic Resources

Group Choice consists of group members choosing to engage in two alternative behaviors. The Ideal Free Distribution (IFD) is a foraging model that relates a group's choices between resource sites and obtained resources (Fretwell & Lucas, 1970). Following the example of Sokolowski, Tonneau, & Freixa i Baque (1999), the present research examined Group Choice with a group of humans who choose blue and red cards (analogous to distributing between resource sites) for points that led to money prizes (analogous to distributed resources). In Experiment 1, members of each card subgroup shared the points allocated to each alternative. In Experiment 2, one member of each card subgroup was picked probabilistically (i.e., randomly) to receive all the allocated points. The experimenter manipulated the relative amounts of points allocated to alternatives and observed corresponding changes in the groups' choices. Probabilistic points interfered with the IFD of Group Choice and shared points did not.



Karen M. Lionello
Purdue University
lionello@psych.purdue.edu

Transfer Across Sample Locations in Pigeons' Matching-to-Sample

Evidence for symmetry (the ability to spontaneously match stimulus B to stimulus A after learning to match A to B) has been difficult to demonstrate in animals. One reason is that when A-B training consists of two-alternative matching-to-sample (MTS), the stimuli change locations during the symmetry test. My previous work has shown that pigeons trained on identity MTS with center-key samples are unable to transfer their matching performances to side-key samples. In the present experiments, pigeons were trained on MTS with two sample locations and then tested on the remaining location. Transfer of matching to new locations was observed independently of which two locations were trained (and which novel location was tested). These data indicate that location can be removed as a controlling characteristic in MTS, thus opening a promising new avenue to test for symmetry.



Gregory J. Madden, Warren K. Bickel, & Eric A. Jacobs
University of Vermont
eajacobs@zoo.uvm.edu

Three Predictions of the Economic Concept of Unit Price in a Choice Context

Economic theory makes three predictions about consumption and response output in a choice situation: 1) when plotted on logarithmic coordinates, total consumption (i.e., summed across concurrent sources of reinforcement) should be a positively-decelerating function, and total response output should be a bitonic function of unit price increases; 2) total consumption and response output should be determined by the value of the unit price ratio, independent of its cost and benefit components; and 3) when a reinforcer is available at the same unit price across all sources of reinforcement, consumption should be equal between these sources. These predictions were assessed in human cigarette smokers who earned cigarette puffs in a two-choice situation at a range of unit prices. In some sessions, smokers chose between different amounts of puffs, both available at identical unit prices. Individual-subjects' data supported the first two predictions but failed to support the third. Instead, at low unit prices, the relatively larger reinforcer (and larger response requirement) was preferred, while at high unit prices, the smaller reinforcer (and smaller response requirement) was preferred. An expansion of unit price is proposed in which handling costs and the discounted value of reinforcers available according to ratio schedules are incorporated.



Francis Mechner & Laurilyn D. Jones
The Mechner Foundation
fm@mechnerfoundation.org ldj@mechnerfoundation.org

Learning History and Resurgence Patterns

The term resurgence refers to the reappearance of antiquated behavior patterns (patterns observed earlier in a subject's learning history). In several separate experiments, groups of human subjects typed non-word sequences of letters on a computer keyboard. Each sequence consisted of criterial (mandated) and non-criterial (optional) key strokes, with the criterial key strokes following one of nine different patterns. The structure of all experiments was the same: six or nine "history-building" sessions followed by a final "test" session. The objective was to study the degree and character of resurgence of both criterial and non-criterial patterns of key presses during the final session, and the relationship of resurgence to errors. In addition, characteristics of the history-building sessions were varied to see what effect these had on the behavior observed in the final session. Specifically, the nine patterns of criterial key presses were introduced at different times in the learning history, and were practiced with different frequencies. The indicator of criterial resurgence was the number of times each of the nine criterial patterns was used during the final session, and the number of errors made. Separately, non-criterial resurgence was measured by noting each specific pattern of non-criterial key presses during the final session and when, if ever, that pattern had been previously used during the subject's history sessions.



Matt J. Morris & Jack J. McDowell
Emory University
mmorr01@emory.edu psyjjmd@emory.edu

Battle of the Digital Organisms

The two authors engaged in a competition to develop a digital organism who could obtain the greatest amount of reinforcement on a series of simulated random interval - random interval concurrent schedules. During testing of the digital organisms, both the value of the schedules and the value of the change over delay were varied. As in traditional concurrent schedule experiments, the digital organisms were provided with discriminative stimuli indicating whether the first or the second schedule was in effect, but were not given information as to the value of either schedule or the magnitude of the change over delay. Results of the competition, a description of the algorithms used by the two digital organisms, and a computer demonstration of the competition will all be presented.



Joel Myerson & Leonard Green
Washington University, St. Louis, MO
jmyerson@artsci.wustl.edu lgreen@artsci.wustl.edu

Temporal discounting in choice between rewards available at different delays

Twenty-four undergraduates made choices between two delayed rewards, one available sooner and the other available later. The amount of the reward available after the briefer delay was adjusted until it was judged equal in value to the larger, more delayed reward. A hyperbola-like model, Value = 1/(1+k*Delay)s, accurately described the decrease in the value of the more delayed reward as the delay until its receipt was increased. The discounting rate parameter, k, decreased systematically as the delay to the sooner reward increased. An interpretation of the hyperbola-like model which views subjects as choosing between two rates of reward accounts for this decrease in discounting rate.



Bertram O. Ploog
College of Staten Island, City University of New York
ploog@postbox.csi.cuny.edu

Effects of Unconditioned Reinforcement on Initial-Link Responding under a Concurrent-Chains Schedule with Nondifferential Terminal Links

This study assessed the effects of unconditioned reinforcement (differences in food amount) on initial-link responding under a concurrent-chains schedule. The initial links were concurrent VI 60-s schedules. The initial-link stimuli were red and green, with position (left or right key) randomized. The correlation between initial-link stimulus color and food amount remained consistent within each condition. The terminal links were either VI 20-s or VI 40-s schedules across conditions and birds. However, for each bird within a condition, the values for both terminal links (presented on the center key) were identical. In the first two conditions, six birds received differential terminal-link stimuli (yellow vs. blue), the remaining six birds received nondifferential terminal-link stimuli (yellow or blue). In the third condition, all birds received nondifferential terminal-link stimuli. Differences in food amount were 3 s vs. 6 s of access to food in the first condition, and 1 s vs. 6 s of access to food in the second and third condition. With very few exceptions, the birds under the nondifferential conditions chose the initial-link stimulus that was correlated with the larger reinforcer. Since the terminal-link stimuli were nondifferential, differential responding in the initial links cannot be explained in terms of conditioned reinforcement represented by the terminal-link stimuli. A preliminary modification of Fantino's (1969) choice model, different from Navarick & Fantino's (1976) model, is suggested by incorporating a factor for food amount that is weighted exponentially in terms of terminal-link duration.



Diana Posadas & Peter Killeen
Arizona State University
d.posadas@asu.edu

One, Two, Three, Many:
Loss of Control by the Present on PR Schedules



François Tonneau, Gerardo Ortiz-Rueda, & Felipe Cabrera
University of Guadalajara, Mexico
ftonneau@udgserv.cencar.udg.mx

Within-Session Increases in Response Rate during Extinction

After twenty sessions of baseline training under a variable-interval 60-s schedule of reinforcement, four Wistar rats were exposed to 30-min extinction probes either at the start or at the end of the experimental session. The rate of responding observed during probes decreased regularly when extinction was scheduled at the end of a session, but tended to increase before decreasing when extinction was scheduled at the start of a session. Rats tended to reproduce in extinction the response profiles they had exhibited from minute to minute in previous baseline sessions. These findings suggest caution before assuming that within-session increases in responding are produced by reinforcers such as food or water.



K Geoffrey White, James Hegarty, & David N Harper
University of Otago, New Zealand
kgwhite@otago.ac.nz

Sensitivity of Self-Control Choices to Delay and Amount

Sensitivity of choice to outcomes varying in delay and amount may depend on the absolute duration of the delays. In parallel studies with pigeons and humans, a concurrent-chains procedure arranged equal and constant variable-interval schedules in initial links, and unequal delays and reinforcer amounts (or durations) in the terminal links. In four different conditions, short and long delays were crossed with small and large reinforcer amounts. In one set of four conditions delays were overall short and in another they were overall long, while maintaining the same ratios. Sensitivity to reinforcer delay was greater in conditions where absolute delay was longer, for both pigeons and humans. For humans, sensitivity to reinforcer amount was greater when absolute delay was longer. In a second experiment with humans, absolute delay was increased by adding a constant delay to each terminal link, as in the preference reversal paradigm. Sensitivity amount was greater for longer absolute delays, confirming a prior result for pigeons (White & Pipe, 1987). The results have a bearing on models in which self-control choice depends on temporal context.






SQAB-Invited Preeminent Tutorials




John W. Donahoe
University of Massachusetts
jdonahoe@psych.umass.edu

Chair: Kennon A. Lattal

From Basics to Contemporary Paradigms: Neural Networks

The tutorial has three goals: (1) to indicate the place of neural- network simulations in the experimental analysis of behavior (EAB), (2) to survey various approaches to neural networks noting those that are consistent with EAB and those that are not (i.e., most realizations of the parallel-distributed-process approach in psychology), and (3) to illustrate the power of biobehaviorally informed neural networks for the interpretation of Pavlovian and operant conditioning, discriminative control, temporal control, language acquisition, and phenomena (e.g., devaluation) from which different types of "associations" are often inferred.




Isidore Gormezano
University of Iowa
i-gormezano@uiowa.edu

Chair: John W. Donahoe

From Basics to Contemporary Paradigms: Reflex Conditioning
(Classical Conditioning: Theory and Data)

Theoretical issues and relevant data will be presented detailing stimulus trace, "law of effect", and neural substrate accounts of classical conditioning. Accounts of stimulus asynchrony and trace conditioning, following Pavlov (1927), have involved postulating a hypothetical stimulus trace to bridge the gap between the nominal conditioned stimulus (CS) and unconditioned stimulus (US). The trace hypothesis has served as a heuristic guide in our generating a body of data detailing the effects of CS-US interval on the frequency of conditioned responses (CRs), latency and peak latency of CRs, and unconditioned response (UR) amplitude. Contiguity and effect formulations have held different accounts of what properties of the US-UR complex produce the acquisition and maintenance of CRs. In contiguity theory, the US is presumed to assure occurrence of the US for eliciting the UR in an appropriate temporal interval with the CS. Whereas, effect theories have proposed CR acquisition is determined by: (a) The presumed motivational properties of US occurrence per se or (b) CR-produced modification of the sensory consequences of the US. Our experimental evidence has indicated that beyond the elicitation of the UR, the US must possess motivational properties. Nevertheless, employing contiguity and effect formulations as heuristic guides, the outcome of experiments manipulating US intensity and duration, as well as those involving the CR-contingent modification of the US yielded results largely in support of contiguity/drive-induction accounts. A final study will be presented involving the determination of bilateral cerebellar lesion effects upon classical conditioning of the rabbit nictitating membrane and jaw movement responses as it bears upon the key role postulated for the cerebellum (anterior interpositus nucleus) upon associative learning. The novel and elegant paradigm produced results providing additional support for the postulated obligate role of the anterior interpositus nucleus upon conditioning of the rabbit's nicitating membrane response.




Peter R. Killeen
Arizona State University
killeen@asu.edu

Chair: J. Gregor Fetterman

From Basics to Contemporary Paradigms: Timing (Parsing Sagely Rosemary's Time)

For Newton "time flowed equitably"; for his cousin Rosemary, it flowed faster sometimes than others, and occasionally stood still. He divided the time into seconds; she into tasks. How do behaviorists parse time? Most theoreticians have pacemakers divide time and counters sum the divisions; there are cognitive, behavioral and connectionist versions. Some repudiate pacemakers, substituting leaky buckets. Why all this talk, and what data does it account for? Does Newton's time work for us, or is Rosemary's better? Is time linear, logarithmic or circular? Attend this tutorial and get it straight.




Michael T. Turvey
University of Connecticut
turvey@uconnvm.uconn.edu

Chair: Philip H. Hineline

From Basics to Contemporary Paradigms: Ecological Psychology: Nonrepresentational Perception and Action

How should the perception and action capabilities of biological systems be understood? The common answers emphasize notions of computation and representation and notions shaped by the potential forms of local processing and global adaptive change in neural networks. A less common answer emphasizes the laws and symmetry conditions at nature's ecological scale, the scale at which living things and their surroundings are defined. This latter emphasis characterizes the ecological approach to perception and action pioneered by Gibson (1966, 1979/1986). Within this approach, dynamics (referring to the laws of motion and change) and dynamical systems (referring to the time-evolution of observable quantities according to law) are natural accompaniments to the study of perception and action. Research on haptic and visual perception, postural control, and the coordination of rhythmic movements will be described. This research highlights the significance of identifying the invariants constraining perception (e.g., the inertia tensor and attitude spinor in dynamic touch) and the collective variables characterizing coordination (e.g., relative phase). It also highlights the value of analyzing the variability of exploratory and performatory behavior through the nonlinear methods of phase space reconstruction, long range (fractal) correlations, and recurrence quantification.




SQAB main page



Date Updated : May 31, 1999