Abstracts of 2002 Conference Papers

Listed below are the SQAB abstracts for Presentations, Poster Presentations, and Preeminent Tutorials.



SQAB Presentations



Friday, May 24



08:15-09:00

Randy Gallistel
Rutgers University

The Information Processing Approach to Conditioning

The framework provided by Claude Shannon's (1948) theory of information leads to a more quantitatively oriented reconceptualization of the processes that mediate conditioning. The focus shifts from processes set in motion by individual events to processes sensitive to the information carried by the flow of events. The conception of what properties of the conditioned and unconditioned stimuli are important shifts from the tangible properties that excite sensory receptors to the abstract and intangible properties of number, duration, frequency and contingency, which are the carriers of the information. In this view, a stimulus becomes a CS if its onset substantially reduces the subject's uncertainty about the time of occurrence of the next US. One way to represent the subject's knowledge about the time of occurrence of the CS is by the cumulative probability function. This function has two limiting forms: 1) The state of maximal uncertainty (minimal knowledge) is represented by the inverse exponential function associated with the random rate condition in which the US is equally likely at any moment. 2) The state of maximal certainty is represented by the cumulative normal function whose expectation is equal to the CS-US latency minus the time elapsed since CS onset and whose standard deviation is the Weber fraction times the CS-US latency.




09:00-09:30

Melissa Burns & Michael Domjan
Texas Christian University & University of Texas at Austin

Stepping outside the box in considering the C/T ratio

Much theoretical work has been based on the finding that the vigor of conditioned behavior is directly related to the C/T ratio-- the interval between successive trials or cycle time (C) divided by the duration of the conditioning trial or trial time (T). This relationship has been obtained in studies of appetitive and aversive conditioning that employed fairly conventional methodology. Subjects were typically moved to an experimental chamber for sessions lasting about 60 minutes during which numerous conditioning trials were administered. With such procedures, time spent in the experimental context is typically confounded with the intertrial interval. In addition, typically only one type of behavior is measured as evidence of conditioned responding. We will present data from studies in which the intertrial interval was independent of differences in context exposure time and several action patterns were measured to track changes in conditioned responding. The experiments were conducted in the sexual behavior system with one trial per day. Under these conditions, conditioned responding was not always directly related to the C/T ratio. Some conditioned responses showed the usual effect whereas other conditioned responses showed the opposite relationship between the C/T ratio and the vigor of conditioned behavior. These findings question the generality of previously published C/T ratio effects and suggest that general theories of learning built on C/T ratio phenomena have to be more cognizant of response factors.




09:30-10:00

Ben Williams
University of California at San Diego

Important distinctions between different roles of time in learning

At least three types of temporal intervals have been varied in studies of animal learning: interreinforcement times with interval schedules, delays between responses and contingent reinforcers, and times to a reinforcer occurrence measured from a stimulus onset. These different types of intervals have different functional properties that cannot be subsumed under a unified timing model. For example, time is averaged differently in simple concurrent VI VI schedules than in concurrent chain schedules. Timing models also fail to deal with various effects of stimuli imposed during temporal intervals, such as conditioned reinforcement, and fail to explain the failure of learning in various discrimination procedures involving delayed outcomes.




11:00-11:25

Alex Kacelnik & Martin Shapiro
Department of Zoology, Oxford & Institute for Advanced Studies, Berlin

Preference as a function of amount and delay to reward. The cross-censorship model of choice.

We investigated how amount and delay determine the magnitude of preferences between food sources in starlings and whether choices can be explained as independent responses to each option. We used fifteen pairs of options that differed in amount and delay to reward. We used three behavioural indices: Latency in no-choice and in choice trials, proportion of choices in choice trials, and proportion of responses in extinction sessions. The three measures showed the same ordinal preferences between treatments, with ratio of amount to delay being a good predictor of the sign of preference in all cases. If generalised matching of the programmed ratios is used to describe partial preferences, the parameters have to be adjusted between behavioural indices, with more extreme preferences (stronger over-matching) for choices than for the other two indices. Functionally, latency is the least meaningful index (no existing normative model predicts delaying reward by delaying response). Latency, however, is an informative and sensitive index, and retrospective analysis shows this to have been the case in previous choice studies. Relative latency in no choice trials can be used as a parameter-free predictor of magnitude of preference in choice trials. Our data support the view that responding to each option follows an independent timing process and that choice may be a sub-product of the shortest latency censoring the response to the alternative. We call this the Cross-Censorship model of Choice (CCC).




11:30-11:55

Armando Machado & Richard Keen
University of Minho (Portugal) & Brown University

The acquisition of a temporal discrimination

In a temporal bisection task animals learn to choose one alternative after a short signal and another alternative after a long signal. A psychometric function may then be obtained by presenting signals with intermediate durations and recording the proportion of times the animal chooses each alternative. The psychometric functions obtained with different pairs of training signals show two properties: (1) when the durations of the training signals are in the same ratio, the corresponding functions are time-scale transforms of one another (the scalar property), and (2) the functions cross the .5 indifference line at the geometric mean of the two training durations. We report a series of experiments that tried to answer the following questions: Besides choice between two response keys or levers, do other behaviors show these two properties? And if so, how early in training do they show them? And how does the typical, steady-state performance relate to the animal's initial performance? To answer these questions, we placed pigeons in a long, 42-in box equipped with keys and feeders at each end, and then using sensitive floor panels we recorded the animals' motions during the signals of a temporal bisection task. In Experiment 1 the signals were 3-s and 12-s long and after the birds learned the discrimination we reversed the assignment of keys to signals. In Experiment 2 we examined performance on two pairs of discriminations, 3 s vs. 9 s, and 9 s vs. 27 s. In Experiment 3 we studied the effects of changing the short or long duration signals separately. The results showed that (a) the motion during the trial was highly stereotypical, i.e., the birds moved to the 'short' side, waited for a few seconds, and then moved to, and stayed on, the 'long' side; (b) this motion predicted the results of generalization tests with intermediate signal durations; (c) the standard deviation of the times of leaving the 'short' side more than tripled when the signal durations tripled; and (d) only the duration of the short signal influenced significantly the birds' behavior. We discuss the implications of these findings for current theories of timing with special reference to the scalar property and the bisection of temporal intervals at their geometric mean.




12:00-12:25

J. E. R. Staddon & D. T. Cerutti
Duke University

Timing and Choice: Preference may be more about Waiting than Choosing

Science is supposed to be convergent and cumulative. Cumulative, in the sense that our knowledge today builds on what we learned yesterday, and convergent in the sense that a given problem should appear simpler now than it did in the past. But our level of understanding of some important problems in operant conditioning satisfies neither of these conditions. Interval timing, or temporal control as it was originally termed, once seemed to be a pretty straightforward part of conditioning in general. But theories of timing have proliferated, many treat "interval timing" as a separate faculty, and extensive early empirical studies of temporal control have played little part in the evolution of theory. Free-operant choice is another example. The simplicities of the matching law have given place to several competing theories of performance on concurrent chain schedules and no theory convincingly links performance on simple and concurrent schedules. Surely, it is time after more than forty years of research on these problems, to either solve them or find out why they cannot be solved. We suggest a new, simple direction for research on choice and interval timing.




02:00-02:45

Susan E. Brandon, Edgar H. Vogel, & Allan R. Wagner
Yale University, Adimark, & Universidad Diego Portales

Stimulus representation in SOP: I. Theoretical Rationalization and Some Implications

SOP was formulated (Wagner, 1981; Mazur & Wagner, 1982) around a molar stimulus representation that allowed for the deduction of stimulus priming, forward and backward conditioning, and how the topographies of CRs and Urs may differ. The assumed molar trace was dictated by the underlying stochastic assumptions of the model, but also has been rationalized (Wagner & Donegan, 1989) in terms of a recurrent inhibition mechanism. Wagner and Brandon (1989; Brandon, Vogel, & Wagner, 2002; Brandon & Wagner, 1998) further proposed a componential stimulus representation, which not only allowed for various dissociations of response measures but provided an understanding of occasion setting and other temporal order effects. Because the componential SOP model [like the Rescorla-Wagner (1972) model] predicts that an "AX+, BX-" problem will be perfectly resolved with sufficient training, it is reasonable to "constrain" the learning rule (cf. Blough, 1975).Some advantages of a constrained componential SOP are that it can deal with CR timing and related temporal phenomena, while recognizing that cues trained in compound do not have completely common fates.




02:45-03:15

Edgar H. Vogel, Susan E. Brandon, & Allan R. Wagner
Yale University, Adimark, & Universidad Diego Portales

Stimulus representation in SOP: II. An application to inhibition of delay

The componential extension of SOP accounts for CR timing in Pavlovian conditioning by assuming that learning accrues with relative independence to stimulus elements that are differentially occasioned during the duration of the CS. SOP, using a competitive learning rule and the assumption that temporal learning emerges via resolution of what is equivalent to an "AX+ BX-" discrimination, predicts a progressive increase in the latency of the CR over training, or what Pavlov referred to as "inhibition of delay." Other componential models, which use noncompetitive learning rules, do not predict inhibition of delay. Either type of model makes the prediction indicated, independently of the length of the CS-US interval. We report two experiments that demonstrated inhibition of delay when rabbits were trained with relatively long, but not with short, CS-US intervals. To account for this divergence, we assumed that the SOP stimulus trace involves two kinds of elements, some with a temporally distributed pattern of activity over the CS duration, and some with a randomly distributed pattern (Mauk & Donegan, 1997). This stimulus representation, not only allows for inhibition of delay with long but not short CS-US intervals, but in combination with SOP's performance rule deduces CRs with "Weber variability."




04:00-04:25

Michael Young & Edward Wasserman
Southern Illinois University at Carbondale & University of Iowa

A computational model of variability discrimination: Finding differences

Young, Wasserman, and colleagues have conducted several demonstrations of variability discrimination by pigeons, baboons, and humans. We have documented the impact of collection variability, and the effects of the number of items, their orientation, their size, and their organization, inter alia. In order to provide a coherent framework for these results, we will describe a computational model that incorporates stimulus similarity through generalization gradients, the effect of spatial proximity on similarity judgment, and the role of incremental learning functions. Parametric fits of data from the three species will document the relative impact of these various factors on performance.




04:30-04:55

Robert Cook
Tufts University

New developments in understanding concept learning by pigeons

The detection of identity and non-identity is one of the most fundamental of psychological discriminations. Drawing from a variety of recent experiments testing pigeons in simultaneous and successive same-different procedures, converging evidence will be presented that these animals can readily learn to form and use abstract rule-like concepts. However, the results of some newer experiments will also be presented that show that pigeons are capable of memorizing very large number of exemplars and associations in similar two-choice discriminations. Some speculations about how these two distinct forms of learning might be reconciled within the same small-brained creature will be considered.




05:00-05:25

Sara J. Shettleworth & Brett Gibson
University of Toronto

Associations, maps, and modules in spatial learning: New experiments with rats

The cognitive sciences have proposed three ways in which qualitatively different and redundant kinds of spatial information interact during learning: they compete according to an error-correcting rule like that in the Rescorla-Wagner model, they are processed by independent mental modules (or memory systems), or they are integrated into cognitive maps. Tests of spatial overshadowing and blocking support the view that different kinds of information interact competitively, whereas data from behavioral neuroscience and ethology suggest that rather than competing during learning, distinct spatial memory systems or modules acquire information simultaneously, in parallel. Our studies of how rats learn to find food on mazes and in open spaces provide evidence for both parallel and competitive learning, depending on the situation.





Saturday Morning, May 25



08:15-09:00

Peter Killeen
Arizona State University

Mathematical Principles of Reinforcement (MPR)

To say a theory is principled means that it is tightly constrained by axioms that are stated at the outset. Whereas the axioms are corrigible, most of the refinement of the theory usually occurs downstream, in the details of implementation with specific models. In this lecture I discuss the principles that constrain my theory of schedule and incentive effects, MPR, the empirical reasons for their selection, and their successes and failures in guiding the development of effective models. Alternative ways of implementing the principles are considered, and their extension to models is explored.




09:00-09:30

Mark P. Reilly
Arizona State University

Revving up the RPMs of MPR: A data-driven evaluation of a theoretical model

If models are like tools, then useful models should be easy to use, general, universal and durable. Unlike tools, a model's durability is dependent upon the very characteristics that make it useful. To test tools, we expose them to extremes -- drag 'em thru the mud, and also apply them to new situations. Math principles of reinforcement (MPR; Killeen, 1994) is evaluated according to these guidelines. MPR is a general model of schedule control that incorporates molecular as well as molar mechanisms. The model is no less an attempt to integrate and abstract the empirical laws of reinforcement schedules which have been accumulating from over a half-century of research. MPR is based upon three principles; 1) incentives excite behavior, 2) there are constraints on responding, and 3) the coupling of responses to reinforcers directs the trajectory of behavior. I will present data derived from various experiments to test MPR. A theoretical approach to reinforcement and schedule effects like MPR, one that is quantitative and general, is indispensable to the experimental analysis of behavior and has the potential to revitalize schedule research.




09:30-10:00

Tony Nevin
University of New Hampshire (emeritus)

Mathematical Principles of Reinforcement: Implications for Behavioral Momentum (and vice versa)

Killeen's mathematical principles of reinforcement explain steady-state response rate, and with reasonable assumptions about altered incentive effects, can also explain many findings concerning resistance to change. Thus, MPR and behavioral momentum may give converging accounts of response strength. Findings of greater resistance with lower steady-state response rates raise problems for such convergence, and I consider two approaches to their resolution.




10:30-10:45

William Timberlake
Indiana University

Marian Breland Bailey: Many Lives

Marian Breland Bailey was one of B. F. Skinner's most widely known students despite not finishing a PhD under his guidance. As a founder and sustainer of Animal Behavior Enterprises (first with Keller Breland and then Robert Bailey), she was very influential in showing the power and limitations of operant shaping procedures. She combined her business ventures with motherhood, consulting, and education, an effort sufficient for many full lives.






 
 
SQAB-Invited Preeminent Tutorials


Saturday Afternoon, May 25



01:00-01:50

Allen Neuringer
Reed College

Chair: Armando Machado (Universidade do Minho)

Variability of the Operant

This tutorial will discuss how reinforced variability can help us to distinguish between elicited responses (unconditioned reflexes, Pavlovian conditioned responses, and sensations), and emitted responses (operant behaviors). Evidence will be reviewed showing that behavioral variability is sensitive to reinforcers contingent upon variability and to discriminative stimuli. Schedule-of-reinforcement effects are similar for variable and repeated operants, including concurrent and second-order schedules. Reinforcement-controlled variability is functional, as when new operants are shaped, or when creating or problem-solving. And differences in sensitivity to reinforced variability may characterize some behavioral disorders, such as Attention Deficit Hyperactivity Disorder, autism, and depression. Emitted and voluntary attributes of operant behaviors will be discussed in terms of consequence-controlled variations in variability -- or sensitivity of variability to reinforcement.




02:00-02:50

Randolph Grace
University of Canterbury

Chair: John A. Nevin (University of New Hampshire)

Quantification

Quantitative data analysis may be an intimidating topic for many people, but actually a good basic understanding of many powerful methods can be obtained easily by regarding data analysis as model specification and comparison. This tutorial will review the model comparison approach to inferential statistics, and also describe some techniques from exploratory data analysis that are useful for model specification and testing, such as residual analysis and data transformation. Alternative measures of goodness-of-fit, such as the Akaike Information Criterion, are also considered. Practical examples will be provided throughout so that the utility of these techniques for solving problems of interest to behavior analysts will be clear.




03:00-03:50

K. Geoffrey White
University of Otago

Chair: John Wixted (University of California, San Diego)

Memory as Discrimination

Memory is seldom viewed from a behavioral perspective because the temporal distance between an event and later remembering seems to require mediation by a stored representation. Treating remembering as discriminative behavior prompts several questions. How is the effect of temporal distance to be scaled? Is the discrimination specific to the time of remembering? How do reinforcers influence remembering?




04:00-04:50

Geoffrey R. Loftus
University of Washington

Chair: Alliston Reid (Wofford College)

Hypothesis Testing: Curse or Abomination?

The practice of significance testing almost universal in social science research. However, this practice is inimical to scientific insight, and has almost certainly impeded both empirical and theoretical progress. In this talk, I will justify these assertions and discuss possible alternatives to significance testing.




 
 
SQAB Poster Presentations


Friday Evening, May 24 06:30 - 08:30 pm


Carlos F. Aparicio & Ángel Jiménez
University of Guadalajara-CEIC

Choice in a variable environment: the effect of an increasing changeover response requirement

Rats pressed on two levers for food in a variable environment that included seven components, each providing a different reinforcer ratio that ranged from 27:1 to 1:27. No signals differentiated the component ratios that were arranged within sessions in random order and separated by 1-minute blackout periods. In different conditions, a changeover lever required 1, 2, 4, 8, 16, 32, or 64 responses to switch from one lever to another. The results were consistent with previous studies using pigeons as subjects; log response-allocation ratios changed rapidly with increasing numbers of reinforcers in all components. When the changeover lever required one or two responses, the slope of the choice relation (sensitivity) increased linearly with the number of reinforcers in all components. However, with four or more changeover responses, sensitivity leveled off at high levels after only about three reinforcers. The highest level of sensitivity within components was observed with a changeover requirement of 64 responses. The implications of the present results for a theory of the effects of reinforcers on local preference are discussed.




Carlos F. Aparicio & William M. Baum
University of Guadalajara-CEIC and University of California-Davis

Comparing choice in variable and non-variable environments

Recent studies suggest that the speed at which choice behavior on concurrent VI VI schedules adapts to reinforcement changes depends on the frequency with which the environment changes. The present study explored this possibility by creating three environments that differed in the frequency with which components changed: within a session, from one session to another, or from one phase to another. The variable environment included seven reinforcer components, each providing a different reinforcer ratio that ranged from 27:1 to 1:27. Within a session, the component ratios were arranged in random order and separated by 1-minute blackout periods. The other two environments included only one reinforcer component that changed once per session (semi-variable environment) or that remained in effect for 15 consecutive sessions (stable environment). In all environments, log response-allocation ratios adjusted rapidly to changes in the reinforcer ratios. However, the slope of the matching relation (sensitivity) differed across environments; the highest levels of sensitivity were observed in stable and semi-variable environments. These results have implications for a theory of the effects of reinforcers on local preference.




Laurie Bloomfield, Ronald Weisman, Christopher Sturdy, & Leslie Phillmore
Queen's University, University of Alberta, & University of Western Ontario

Do black-capped chickadees treat their calls and those of Carolina chickadees as different open-ended categories?

We focused on categorization of the social calls of two well-studied and closely-related species: black-capped chickadees and Carolina chickadees. We trained black-capped chickadees in a go/no-go discrete-trials operant discrimination with a sample of black-capped calls as S+s, a second sample of black-capped calls as S-s, (a within-species discrimination) and a sample of Carolina calls as S-s (a between-species discrimination). If the calls of both species constituted a single category then the within and between category discriminations should be equally difficult (because they both require birds to memorize the calls one-by-one. In fact, the within-species discrimination was considerably more difficult for each bird than the between-species discrimination. Also, the between-species discrimination transferred almost perfectly to a sample of novel Carolina calls and propagated back from the novel sample to the original sample. The results favor the hypothesis that the acoustically similar social calls of these species constitute separate open-ended categories.




Orn Bragason
University of Canterbury

Terminal link value: Ratio versus difference

How does the distribution of terminal link delays affect preference? In this study, the arithmetic mean duration of the initial and terminal links was kept unchanged within and across conditions, while the harmonic mean duration of the terminal links was varied. Effectively the harmonic mean manipulation pitted the ratio against the difference of the terminal links while keeping the arithmetic mean unchanged. The findings are compared to the contextual choice model, value-addition model, and two new model, the delay difference model and delay ratio model.




José E. Burgos
University of Guadalajara-CEIC

Simulating latent inhibition with the Donahoe-Burgos-Palmer (DBP) model

The Donahoe-Burgos-Palmer (DBP; Donahoe, Burgos, & Palmer, 1993) neuro-computational/neural-network model was used to simulate latent inhibition (LI). Results showed that the model can simulate facilitation of LI by the number of preexposed, intensity, and duration of the preexposed conditioned stimulus (CS). The model also simulates generalization of LI, where LI is disrupted by preexposure to orthogonal input patterns. The mechanism by which the model simulates LI is exactly the same one by which it simulates extinction, namely, weight decrement, a component of the learning rule. The realization of this mechanism in the present simulations required two conditions. First, networks came to the experimental situation with substantial initial sensory synaptic efficacies (20%). Second, weights were decreased whenever the reinforcement signal was equal to or less than 0.001 (as opposed to 0, the value we have used in all previous simulations). These conditions allowed for enough weight decrement as a result of CS preexposure for the preexposed networks to take significantly more trials to satisfy the acquisition criterion (a CR activation of 0.9 or more). (Reference: Donahoe, J. W. , Burgos, J. E., & Palmer, D. C. (1993). A selectionist approach to reinforcement. Journal of The Experimental Analysis of Behavior, 60, 17-40).




Michael Lamport Commons
Harvard Medical School

The reduction of operant to respondent conditioning

The relation of operant to respondent conditioning has long been a matter of debate in the field of psychology. Most theorists have adopted either single factor or a two factor theory. We present a theory that falls into neither of these camps. The functions of operant reinforcement can be derived from the functions and properties of respondent pairing operations (See for example, Killeen's incentive theory). We propose that the operant response is not simply spontaneously emitted behavior, but rather is an acquired response to certain internal stimuli. We posit the existence of an event within the brain called the internal unconditioned stimulus (ius). With this internal stimulus and two extra respondent pairings, the "What to Do" and "When to Do" pairings, operant conditioning can be explained in terms of respondent pairing operations. This theory is also extended to the acquisition of preference.




Kent Conover & Peter Shizgal
Concordia University, Canada

Telling work from leisure time: quantitative analysis of the temporal distributions of behavior for the validation of a labor supply model of single operant choice

A labor supply model provides a promising basis for scaling the reward value of electrical brain stimulation in rats. This model predicts that the amount of leisure time (lever release) per reward is a power-function of the work time (lever hold) per reward. However, the valid use of this model requires an accurate account of work and leisure time. Uncertainty about the account can arise when operant bouts consist of multiple holds separated by releases too brief to be classed plausibly as leisure time. To assess the possible impact and to provide a basis for correction, the release time distributions under different levels of reward and work were analyzed using log survivor plots, and quantified by fitting mixed gamma distribution models. These analyses revealed that the bulk of the bar release samples were composed of brief releases (<1 s) when the required hold time was low and the reward level was high, suggesting that all time was allocated to work. In contrast, when the required hold times were higher and the reward level lower the proportion of long releases (>1 s) in the samples increased, indicating that significant time was allocated to leisure. These analyses are useful because they provide an independent basis for determining to whether the data contain enough genuine leisure time for application of the labor supply model. In addition, the analysis revealed temporal details about the interaction between subjects and the schedule. The first release times were generally longer than the subsequent release times within each inter-reinforcement interval. Similarly, the first hold times were also longer than the subsequent hold times in each encounter. Moreover, log survivor analyses revealed that subjects increased the mean hold duration as the work requirement increased. In other words, the temporal pattern of the operant did not remain fixed in the face of schedule changes. Such information may have important implications for molar and molecular models of operant behavior.




Randolph C. Grace, Orn Bragason, & Anthony P. McLean
University of Canterbury

Rapid acquisition of preference in concurrent chains

We report two experiments using a concurrent-chains procedure in which one terminal-link schedule was fixed-interval 8 s and the alternative schedule changed randomly from day to day. In Experiment 1, the alternative schedule varied between 4 s and 16 s according to the pseudorandom binary sequence used by Hunter and Davison (1985). Similar to results with concurrent schedules, pigeons' response allocation was most sensitive to the schedules arranged in the current session, although some effect of prior history was evident. Overall sensitivity was lower than comparable data from steady-state research. In Experiment 2, a unique value between 2 s and 32 s was used for the alternative-schedule delay in each session. Sensitivity levels were similar to Experiment 1, and remained unchanged across 61 sessions of training. For all subjects, sensitivity was greater when the alternative-schedule delay was greater than 8 s compared with when it was less than 8 s. Generalized-matching plots revealed evidence of clustering of data points into two groups, suggesting that a process similar to a categorical discrimination may have at least partly determined response allocation. Overall, this research shows that pigeons' initial-link response allocation can adjust very rapidly to frequent changes in the terminal links.




Julie A. Grimes & Richard L. Shull
University of North Carolina Greensboro

Resistance to extinction with auditory discriminative stimuli

Resistance to extinction has commonly been tested in rats using visual stimuli signaling the availability of different rates of reinforcement (i.e., different VI schedules) in a multiple-schedule procedure, with the signal of higher rates of reinforcement resulting in greater resistance to extinction. There has been some difficulty in getting similar results using auditory discriminative stimuli, leading Mauro and Mace (1996) to conclude that stimulus salience may be an important factor that contributes to findings of resistance to extinction. A related factor may be the stimulus control engendered by the discriminative stimuli in a multiple schedule. Rats were trained to lever press in the presence of steady and pulsing tones, each signaling the availability of either a rich or lean schedule of reinforcement (counterbalanced across subjects). Resistance to extinction was assessed across three days of extinction. Differential resistance to extinction between tone components was found only when stimulus components were separated by a relatively long "no-tone" blackout period during training. Indirect evidence suggesting the role of the blackout in improving stimulus control by the tones will be presented.




Masato Ito, Daisuke Saeki, & Masatoshi Tachibana
Osaka City University & JSPS, Japan

Discounting of shared reward in a game situation

Discounting of a shared reward was examined in a Chicken game-like Situation in which participants were given choices between a hypothetical 1,500 yen with sharing ("Saving") and an unshared money of variable amount ("Wallet") for successive 10 trials under each of five conditions of number of other people ranging from 1 to 24. The amount of unshared money was increased or decreased depending on choices in the previous trial. If the "Saving" option was chosen in the previous trial, then the amount of unshared money increased by 150 yen. On the other hand, if the "Wallet" option was chosen, then it decreased by 150 yen. The amount of unshared money after 10th trial was defined as the subjectively equivalent amount of money to the shared 1,500 yen. In addition, if a participant and other people chose the "Wallet" option, then they couldn't obtain any money (a "0 yen" condition) or they had to pay 1,500 yen (a "-1,500 yen" condition). As a result, the discounting of the shared reward was well described with a hyperbolic function, and the discounting rate represented by the parameter of the function was lower in the "-1,500 yen" condition than in the "0 yen" condition. These results revealed that the discounting rate was influenced by the loss of the money arranged in the pay-off matrix of the game.




B. M. Jones
University of Auckland, New Zealand

A comparison of two quantitative models of matching-to-sample performance

Five pigeons performed a simultaneous matching-to-sample task by pecking transparent keys mounted in front of a liquid-crystal display unit. Two samples and two comparisons consisted of random-dot patterns that were generated on each trial by querying a set probability of lighting a pixel. The disparity of the samples and of the comparisons was varied across three sets of conditions by arranging different pixel probabilities. Within each set of conditions, the reinforcer ratio arranged for correct responses was varied over a range that was wider than normal (160:1 to 1:80) in order to test contrasting predictions of the Davison and Tustin (1978) and the Alsop-Davison (1991) models. Neither model predicted accurately performance in extreme reinforcer-ratio conditions when the disparity of sample stimuli was intermediate and either equaled or exceeded the disparity of the comparisons. In these conditions, choice on trials involving the sample associated with the lower rate of reinforcement did not become biased toward the incorrect response but instead approached indifference and large position biases emerged. These results suggest a degree of independence between performance following the two samples that is not easily accommodated by extant quantitative models.




Christian Krageloh
University of Auckland, New Zealand

Choice in a variable environment: The effects of signaling reinforcement ratios

Six pigeons were trained in experimental sessions that arranged six or seven components with various concurrent-schedule reinforcer ratios associated with each. The order of the components was determined randomly without replacement. Components lasted until the birds had received 10 reinforcers, and were separated by 10-s blackout periods. The component reinforcer ratios arranged in most conditions were 27:1, 9:1, 3:1, 1:1, 1:3, 1:9 and 1:27; in others, we arranged only six components, three of 27:1 and three of 1:27. In some conditions, each reinforcement ratio was signalled by a different red/green flash frequency, with the frequency perfectly correlated with the reinforcer ratio. Additionally, we arranged a changeover delay in some conditions, and no changeover delay in others. When component reinforcer ratios were signalled, sensitivity to reinforcement values increased from around 0.40 before the first reinforcer in a component to around 0.80 before the 10th reinforcer. When reinforcer ratios were not signalled, sensitivities typically increased from zero to around 0.40. In no-change-over-delay conditions, sensitivity to reinforcement was typically around 0.20 lower than in change-over-delay conditions. Preference was extreme towards the reinforced alternative for the first 20 s after reinforcement, regardless of whether components were signalled or not, but in conditions without a change-over delay this preference pulse was considerably smaller. When the data from this time period were excluded from the analyses, sensitivities to reinforcement values were smaller, and the discrepancies in values became less between conditions arranging a change-over delay and those that did not.




Jérémie Jozefowiez, Jean-Claude Darcheville, & Philippe Preux
Université Charles de Gaulle (Lille) & Université du littoral (Calais), France.

Operant conditioning as a Markovian decision problem: application to variable and random ratio schedules of reinforcement

Markovian decision problems are a kind of optimization problems in which an agent must learn how to optimize the amount of reward it can collect during its interaction with its environment. We use them to analyze the task faced by an animal in random and variable schedules of reinforcement. Predictions of the model derived from this analysis are compared to three sets of data from the literature obtained in men, rats and pigeons and are contrasted with the ones of its main challenger in psychology, Herrnstein's equation. This reveals the existence of two response strategies in ratio schedules, one which corresponds to our model, the other which is closer to Herrnstein's equation.




Jay-Shake Li & Joseph. P. Huston
University of Düsseldorf, Germany

Nonlinear dynamics of FI-responding: an analysis using the Extended Return Map

In recent years, the application of nonlinear dynamic approaches have accomplished remarkable successes in many fields of research. Unlike the traditional approaches to behavioral studies, which relied mainly on averaged quantities, the nonlinear dynamical approaches tackle the real time data directly. One of the central problems in the nonlinear dynamic studies is the reconstruction of a multi-dimensional phase space out of a one-dimensional time series data. In our previous work, we introduced a new method, the Extended Return Map (ERM), to reconstruct a multi-dimensional diagram using the one-dimensional inter-response-time (IRT) data acquired from a Skinner-box experiment. In the present study, we applied the ERM to operant behavior controlled by FI schedules. We found very interesting lattice patterns in the ERMs. A study using the surrogate data sets in conjunction with the calculation of fractal dimension revealed that the lattice patterns found in the ERM reflected dynamical properties of the FI-responding. The accompanying stimulation studies further suggested that an abrupt switch between two or three different behavioral states is essential for the generation of this characteristic ERM patterns. These results also showed that the traditional analyzing tools, the IRT distribution and the cumulative record, are not sufficient to describe the nonlinear dynamics of the FI-responding. In addition, the switch between behavioral states during the time course between two adjacent rewards linked our model to the well-known Behavioral Theory of Timing (BeT) proposed by P. Killeen.




Elliot Ludvig & John Staddon
Duke University

Temporal tracking of square-wave interval schedules of reinforcement by pigeons

Square-wave interval schedules of reinforcement offer a window into the ability of animals to dynamically adjust their behaviour to temporal properties of the environment. A square-wave schedule presents an animal with a cycle of two different fixed intervals, repeating each interval several times in succession (e.g. 12 FI 60 sec followed by 4 FI 180 sec). Previous results have puzzled researchers for many years because pigeons tend not to show temporal tracking on these schedules, despite evidence of tracking on what would seem to be more complex cyclic arithmetic progressions or sinusoidal schedules. In this experiment, six pigeons were exposed to several square-wave schedules (x FI 30, y FI 90), varying the number of successive intervals of each duration in a given cycle. Contrary to earlier findings, results indicate that the post-reinforcement pause of pigeons tracks the duration of the previous interval for many square-wave schedules. However, pigeons do not track the previous "long" interval when that interval is only presented a single time per cycle. Under these conditions, the failure-to-track was not immediate, but rather developed across sessions, taking several days for tracking to disappear entirely. We consider the results under the framework of the MTS habituation model of interval timing, and present results from simulations that account for tracking by the pigeons on several schedules, but cannot accommodate failure-to-track when the "long" run length is equal to 1.




Marina Menez & Florente López
Universidad Nacional Autónoma de México

Temporal learning acquisition following different pre-training conditions

Some recent proposals on interval timing emphasize the need of acquisition studies in order to understand the processes of temporal learning. However, few studies have considered the influence of previous experience on acquisition patterns. The present experiment examined the possible effects of pre-training conditions on the development of temporal control, in rats. CRF, RI and FT schedules were maintained for 30 sessions, then an FI 30-sec schedule was introduced for 60 additional sessions. We compared patterns of development on response pattern and pausing, and determined to what extent prior exposure to different pre-training schedules affect their steady-state properties. Findings are discussed in the context of current dynamic models of timing.




Francis Mechner & Laurilyn Jones
The Mechner Foundation

A new tool for the study of operant shaping

The experimental analysis of operant shaping has languished because of the absence of good research tools. For the present studies, the research tool was a touch-sensitive computer graphics tablet and stylus, programmed to require the participant to draw lines emanating from a circle approximately 3/8" in diameter. Participants were instructed to draw a straight line that was displayed on the computer monitor. The computer recorded each line's length, slope, the speed with which it was drawn, and the pressure applied to the stylus. One question addressed was whether the presentation of an individual reinforcer acts primarily on the specific characteristics of the preceding response, or on the direction of change of the preceding behavior. A series of experiments was designed to address this and other questions about shaping, using the drawn line as the revealed operant. Different experiments used different criterial parameters (for example, a line between 200 and 250 pixels in length) or combinations of criterial parameters. The participants received reinforcement for movement toward the programmed target, and monetary reinforcement for reaching the target. Although performing this operant is a single action on the part of the participant, when using the graphic revealed operant (unlike key-peck or bar-press operants) it is easy to track how each operant differs from the one before it and thus observe the mechanics of shaping at work.




Curtis Mower & Amy Odum
University of New Hampshire

Morphine produces the "Choose-Short" effect

Inserting a delay between sample offset and comparison onset in duration discriminations results in a disproportionate number of responses to the stimulus associated with the shorter duration. We examined the effects of morphine on production and discrimination of specific temporal patterns of behavior by pigeons. Intermittent reinforcement of pecks to a lighted center key established a bimodal distribution of interresponse times (IRTs). Birds reported the duration of the most recently occurring IRT as Long (6-9 s) or Short (2-3 s) during symbolic-matching-to-sample (SMTS) trials. Matching categorizations produced access to food. Morphine flattened the IRT distribution, with no substantial change in the proportion of IRTs falling into either category. Furthermore, morphine resulted in a disproportionate number of responses to the "short" comparison during SMTS trials. Results may be interpreted as increased tendency to "choose short", or as underestimation of time produced by morphine administration.




Orduña, Vladimir, Herrera, Miguel, Zamora, Oscar & Bouzas, Arturo
Universidad Nacional Autónoma de México (UNAM)

Preference for sequences of outcomes

Although research in choice has been dominated by studies with single outcomes, there is increasing interest in choice between sequences of outcomes. The research on this topic with humans and animals has yielded different results: Whereas humans seem to prefer a sequence that improves over time, animals seem to prefer a sequence that worsens over time. Preference for worsening sequences is compatible with the hyperbolic discounting models that have dominated mechanistic explanations of choice. The present experiment tested some predictions of the parallel model developed by Brunner and Gibbon (1995). Using a successive encounters procedure -- an analog of foraging in natural environments --, we tested the adequacy of the parallel model by manipulating the accessibility of the sequences, the presence or absence of a common initial outcome, and the mean value of the sequences. The main results support the parallel model because there was a strong sensitivity to the first outcome of each sequence and to the preceding (search) time common to both schedules.




Xochitl de la Piedad, Douglas Field, & Howard Rachlin
State University of New York at Stony Brook

How long do pigeons wait for a preferred reward that arrives after an unpredictably long delay?

In self-control experiments, delayed rewards, as usually studied, arrive at some fixed time after a choice. The current experiments introduce uncertainty in the length of this delay. In Experiment 1 pigeons could choose between a random-interval (RI) and a fixed-interval (FI) schedule for the same reinforcer. If they chose the random-interval alternative they could switch to the fixed interval (which would begin at the point of the switch) at any time before reinforcement. However, the first response on the FI extinguished the RI key. We analyze the probability of switching to the FI as a function of how long the pigeon already waited on the RI. Results show that, with a standard RI schedule, the probability of switching was highest at the beginning of a trial. However, even though reinforcement probability was constant on the RI schedule, the longer a pigeon waited, the more likely it was to continue to wait--in economic terms, an example of paying attention to sunk costs. In Experiment 1 the random interval schedule was in force indefinitely, as long as the pigeon did not switch to the fixed interval. In Experiment 2, however, after waiting for a very long time on the RI, the probability of reinforcement was suddenly reduced to zero. Now, pigeons increased switching as the extinction point was approached. These results bear on current self-control research, especially the sort of self-control studied as "delay of gratification."




Antonio Ponce & José E. Burgos
University of Guadalajara-CEIC

Simulating Pavlovian stimulus generalization with the Donahoe-Burgos-Palmer (DBP) model.

The Donahoe-Burgos-Palmer (DBP; Donahoe, Burgos, & Palmer, 1993) neuro-computational/neural-network model was used to simulate Pavlovian stimulus generalization. Fully-connected feedforward networks with six input units (I1 through I6) and two three-element hidden layers, were given 300 reinforced trials of the input pattern (0,0,1,1,0,0), where I3 and I4 where maximally activated for 6 time-steps. The intertrial interval was assumed to be sufficiently long to allow the networks' processing elements to reach their resting-activation states. Then, networks were tested (with the learning rule deactivated, to prevent extinction) with 100 trials of each of the following patterns: (0,0,1,1,1,1), (0,0,0,1,1,1), (0,0,0,0,1,1), (1,1,1,1,0,0), (1,1,1,0,0,0), and (1,1,0,0,0,0). Dissimilarity with respect to the trained pattern thus was defined as the Hamming distance h between it and each test pattern. Generalization gradients were obtained with the peak at h = 0 (the trained pattern) and responding decreasing with h. (Reference: Donahoe, J. W. , Burgos, J. E., & Palmer, D. C. (1993). A selectionist approach to reinforcement. Journal of The Experimental Analysis of Behavior, 60, 17-40).




Diana Posadas-Sánchez & Mark P. Reilly
Arizona State University

The Effects of Late-Session Changes in Reinforcement Rate Under Progressive-Ratio Schedules

Progressive-ratio (PR) schedules involve within-session decreases in reinforcement rate. This characteristic may be responsible for the discrepancy observed between response rate functions generated by fixed-ratio (FR) and PR contingencies. If reinforcement rate experienced later in the session influences behavior in PR schedules, then changing the rate of reinforcement late in the session should modify early to mid-session response rates. If high rates of reinforcement were programmed to occur near the end of a PR session, one might predict an increase in response rate before that change occurred. The present experiment evaluated the effects of late-session changes in reinforcement rate on behavior maintained by progressive-ratio schedules. Pigeons responded under a PR 2 schedule to establish baseline. Next, a mixed PR, x schedule was in effect where x was either a variable-interval (VI) 15 sec or FR 120 depending on the condition. The transitions into schedule x occurred after completing ratio 120 on the PR and lasted for 20 trails. An A-B-A-C design was used, and each condition was in effect for 10 sessions. PR response rates at the ratios preceding transitions were directly related to the reinforcement rate associated with the transition schedule. The present findings suggest that PR performance is influenced by the current as well as subsequent events within a session.




Daisuke Saeki, Masato Ito, & Kazuki Masaki
JSPS & Osaka City University, Japan

Self-control choice and temporal discounting in game situations

A new definition of self-control has been proposed by Brown & Rachlin (1999), in which continuing to choose a smaller reward can maximize the total amount of a reward. The present research, using game situations that have similarities to choice situations in daily life, was conducted to examine the correlation between the number of self-control choices and the rate of temporal discounting. 12 undergraduates experienced two types of games in which they chose between "Saving" and "Wallet" options in each trial. In Game1, choice of the "Saving" option led to a pair of the poor options (660 yen vs. 500 yen) in the next trial whereas choice of the "Wallet" option led to a pair of the rich options (1,200 yen vs. 900 yen) in the next trial. In Game2, immediate amounts of money subjectively equivalent to the delayed 1,500 yen were estimated under four delay conditions for each participant. As a result, some participants developed the choice of the "Wallet" option (self-control group), but others didn't (non-self-control group) in Game1. Temporal discounting was well described by a hyperbolic function. For the self-control group, the number of choices of the "Wallet" option in Game1 was negatively correlated with the discounting rate estimated in Game2. These findings indicate that the discounting rate is a valid measure for the degree of self-control based on the new definition.




Federico Sanabria, Forest Baker & Howard Rachlin
State University of New York at Stony Brook

Signaling cooperation and defection in a free operant prisoner's dilemma game

Four pigeons were each exposed to a single random-ratio schedule of reinforcement for an average of 25 pecks on either of two keys (RR25). Pecks on one key were defined as "cooperations"; pecks on the other key as "defections". The amount of a reinforcer on the RR25 schedule depended on the last two pecks according to an iterated prisoner's dilemma game (IPD) as played against a tit-for-tat strategy: 5-s of food access for cooperate-cooperate; 3-s of food access for defect-defect; 6-s of food access for cooperate-defect; 2-s of food access for defect-cooperate. The IPD game was played under two stimulus conditions. In the first, the key lights remained white regardless of the key pecked. In the second, the keys were illuminated with a particular color (green or red) depending on which key was last pecked (cooperate or defect). The key color was thus both feedback, indicating the key just pecked, and a discriminative signal for a pair of higher or lower valued alternatives ("good" and "best", or "bad" and "worst" in terms of seconds of food access). Without the color signal, pigeons generally defected. Adding the signal increased cooperation.




Sánchez-Castillo H., Osorio A., Reyes R., & Velázquez-Martínez D. N.
Universidad Nacional Autónoma de México

Effects of the agonist D2 D-amphetamine in a retrospective and immediate timing tasks

Research suggests that dopaminergic agonists, particularly those of the D2 subtype, modulate the timing behavior in many species (producing an increase in the speed of the proposed "clock"). However, contradictory evidence exists about the role of this receptor subtype, particularly when its participation is evaluated in the interval bisection task. The aim of our work was to evaluate the effects of d-amphetamine on the peak procedure (an immediate timing task) and the bisection task (a retrospective timing task). One group of rats was trained in a Fixed Interval schedule of 30 s (FI30); in probe trials reinforcement was omitted and trial length extended to 120 s; other groups of subjects were trained in a temporal bisection task to emit one response following short (2 s) and another response following long (8 s) stimuli. Results in the peak procedure showed that amphetamine produced a dose-dependent increase in response rate but a modest shortening in the time of occurrence of the peak. In the bisection task results showed a non-significant leftward shift in the bisection point. Together, these results question the proposed dopaminergic mediation of temporal control.




Matthew T. Sitomer, Peter R. Killeen, & Mark P. Reilly
Arizona State University

Erasure of memory by reinforcement in variable interval schedules

In experiments with humans, rats, and pigeons, McDowell and colleagues (McDowell & Wood, 1984, 1985; McDowell & Dallery, 1999; Dallery, McDowell, & Lancaster, 2000) have invalidated Herrnstein's (1970) assumption of an invariant y-asymptote (k) for the hyperbola describing response rate on variable interval (VI) schedules. Herrnstein's k has been shown to vary with deprivation level, reward magnitude, and reinforcer potency (see also Heyman & Monaghan, 1994; Bradshaw, Ruddle, & Szabadi, 1981). These findings have been interpreted as evidence in favor of linear system theory (c.f. McDowell, 1980), a "pure algebraic" version of matching theory. The present experiment explored these findings within the context of Killeen's (1994) Mathematical Principles of Reinforcement (MPR), a broad theory of schedule control. Pigeons and rats both experienced a 5-component multiple VI schedule, with reinforcement delivery manipulated as follows. All scheduled reinforcers were programmed as two reinforcement events, either delivered in rapid succession or separated by a 3-s delay. With amount of reinforcement, time of the reinforcement period, and deprivation level held constant, this manipulation was designed to assess the impact of erasure of memory by reinforcement (Killeen & Smith, 1984) on k. An updated VI Coupling equation is presented which reflects the influence of erasure. As revised, MPR accounts for the well-documented changes in k.




Tetsuo Yamaguchi & Masato Ito
Osaka City University, Japan

Rats' sensitivity to pre-and post-reinforcer delays in a self-control choice situation

Rats were exposed to a concurrent-chains schedule in which reinforcer amounts, pre-and post-reinforcer delays were varied. Two independent variable-interval 60-s schedules were in effect during the initial links, and delay periods were defined by fixed-time schedules. There were conditions differing in pre-and post-reinforcer delays. In one condition, rats chose between immediate larger reinforcers terminated with 0, 40, 80, or 120s post-reinforcer delays and immediate smaller reinforcers terminated with no post-reinforcer delays. In the other condition, rats chose between larger reinforcers delayed by 0, 40, 80,or 120s and immediate smaller reinforcers. In both conditions, choice was predicted from the relative overall reinforcement densities of the alternatives calculated by the ratio of reinforcer amount and total time (e. g., choice period, pre-reinforcer delay, reinforcement period, and post-reinforcer delay). The results indicate that post-reinforcer delays as well as pre-reinforcer delays influence rats'choices in the present self-control situation, although the effects of pre-reinforcer delays are greater than those of post-reinforcer delays. Therefore, the effects of post-reinforcer delays must be incorporated in models of self-control choices.




Rich Yi
University at Stony Brook

Emphasizing the Effects of Repeated Cooperations and Defections in a Self-Control and Social Cooperation Game

A prisoner's dilemma game (PDG) must have four different combinations of payoffs (payoff matrix), depending on the four combinations of choices of the two players. In most iterated PDG studies, the same payoff matrix is available from trial to trial, with the only motivation for cooperation coming from higher total payoff if the other player uses some reciprocal strategy. The motivation for cooperation in self-control studies is similar in that the benefit of cooperation is experienced at a later point in time – possibly the end of the experiment. It is suggested that a payoff matrix that remains fixed from trial to trial promotes a myopic perspective by the participant, and thus restricts the level of cooperation in iterated PDG and self-control experiments. Changes in the payoff matrix in a given trial resulting from the combination of choices in the pervious trial may help expand the window of the future. When the second of two consecutive mutual cooperations is more valuable than the first, and the second of two consecutive mutual defections is less valuable than the first, the participants may be more likely to examine the effect of the present choice on the payoff matrix for future choices. A modification of the Brown & Rachlin (1999) procedure was used to model this idea. Results suggest that expanding the window of the future may enhance cooperation in self-control situations, but may have no impact in PDG.





SQAB main page



Date Updated : May 21, 2002