Modelling Timing Performance on the Peak Procedure
Models based on Scalar Expectancy Theory (SET) and the connectionist model of Church and Broadbent (1990) were used to simulate four data sets from the peak procedure. On the peak procedure, a signal (light or tone) usually means a reward for a response after a fixed interval (FI). On occasional tests, from which data are gathered, the signal is left on for a long time and reward is withheld. On individual trials, a period of high rate of responding (run) is sandwiched between periods of low rates of responding. The start and end of the run, as well as the middle and duration of the run, were determined on each trial. The means and standard deviations of these statistics, as well as the correlations among them, were the data modelled. On a trial, the models based on SET all sampled one value from a memory distribution of expected times of reward and one or two values from one or two threshold distributions. Models with both difference and ratio comparison rules fit the data well. The connectionist model replaced the clock with a vector of oscillator readings, and the memory distribution with a matrix of vector autocorrelations. Memory is not sampled. The memory matrix multiplied by current clock vector gives a comparison vector. The threshold is some angle between the clock vector and the comparison vector. These models did not fare well.
Psychometric Space and Models of Discriminability
Six pigeons were trained on three-alternative concurrent variable-interval schedules. Two of the alternatives were signaled by 570 nm and 630 nm colors. The third alternative was signaled by various colors, changed across conditions, between these values. For each central-stimulus value, three relative reinforcer-rate manipulations were carried out. The data were analyzed according to the model proposed by Davison (1991) and used by Davison and McCarthy (in press) which provides estimates of inter-stimulus discriminability. These values were investigated for additivity according to various models of psychometric space. No simple psychometric space produced acceptable additivity indicating that the measure of discriminability was probably inadequate. Some possible alternative measures are discussed.
Delay Reduction Theory: Some Refinements?
According to delay-reduction theory, the effectiveness of a stimulus as a conditioned reinforcer may be predicted most accurately by calculating the reduction in the length of time to primary reinforcement measured from
the onset of the preceding stimulus. However, it may be that delay-reduction should be instead measured
from the overall time between reinforcements in a situation (a more molar view). If so, delay-reduction theory would need to incorporate a term for the intertrial interval. Experiments on this and other potential refinements of delay-reduction theory are reported.
An Experimental Synthesis of Timing and Counting Processes
In situations where events occur with temporal regularity (e.g., FI schedules) and in those where the regularity involves the number of events (e.g., FR schedules) either the duration or number of events may contribute to the observed pattern of behavior. Normally, experimenters try to identify one dimension or the other as the single controlling dimension. Alternatively, however, both dimensions may contribute to the behavioral performance. For instance, foraging animals may use both the time spent foraging and the number of prey consumed in choosing whether to leave a food patch. The utility of multiple sources of information will depend on the multicollinearity of the predictors. These points are discussed in the context of supporting data and a theoretical analysis of human timing (Killeen & Weiss, 1987).
Better is Worse: Why Arousal Makes Birds Appear Not to Match When They Really Are
A recent result by Belke (1992) challenges Scalar Expectancy (SET) and other accounts of matching in concurrent VI schedules. He studied concurrent probe tests of stimuli associated with equal VIs but trained in alternative concurrent pairs. In training one was preferred and the other not. Probes revealed a strong preference for the alternative preferred in training. An experiment is reported showing that this result is not due to generalization from the training preference. When the probe is between the two preferred training stimuli, the richer schedule is unpreferred. An SET account of these results is presented incorporating Myerson and Miezen's (1980) Markov analysis of concurrent choice and some of Killeen et. al.'s (1978) arousal ideas.
Conditioned Reinforcement and Choice Between Delayed Rewards
A generalization of the contextual choice model (CCM; Grace, in press) of concurrent chains is presented. The model assumes that the value of a terminal-link stimulus as a conditioned reinforcer is the expected scaled immediacy (ESI) of reinforcement in its presence. The ESI model can quantitatively account for data from concurrent VI VI, concurrent chains, and the adjusting-delay procedure, and indicates that the delay of reinforcement gradient is a negative power function. However, systematic parameter deviation is observed -- sensitivity to immediacy is greater when both schedules are fixed than when both schedules are variable. Also, the model is unable to account for terminal-link stimulus conditions in concurrent chains (e.g., preference for cued versus uncued, or multiple versus mixed terminal links). Because the ESI model is molar, it can quantitatively describe these effects but cannot explain them. Therefore, a linear-operator model (similar to Rescorla-Wagner) is proposed which dynamically calculates conditioned reinforcement value. The associative conditioned reinforcement (ACR) model explains systematic deviations in sensitivity and the effects of terminal-link stimulus conditions as the result of an underlying associative process. As represented by the model, conditioned reinforcement value may provide a quantitative link between steady-state operant choice and Pavlovian conditioning.
Effects on Variable Interval Performance of a Step Function Transition
It has been expected that modulation of a variable interval (VI) schedule with a step function transition during repeated trials would support a corresponding transition in operant behavior. A simple step function transition can be implemented using a fixed-time extinction. Each trial consists of three components: a) a reasonably rich VI schedule in effect for a time T, b) an unsignalled switch at time T to extinction which lasts until Ttrial, c) a brief blackout to signal the end of the trial. Varying the extinction time T and the underlying VI schedule provides an experimental test of linear systems analysis. Linear systems analysis requires that a transfer function relating the measured reinforcement and responding exist. The test of linear systems analysis for this experiment is whether, in steady-state, the transfer function computed with one pair of T and average VI value can be used to predict the performance obtained with a new pair. In addition to the test of linear systems analysis, an effect on the IRT distribution by the inter-reinforcement interval distribution is noted.
Mathematical Principles of Reinforcement
Conditioning requires a correlation between the experimenter's definition of a response and an organism's, but an animal's perception of its behavior differs from ours. I explore various definitions of the response, using the slopes of learning curves to infer which comes closest to the organism's definition. The resulting exponentially-weighted moving average provides a model of memory that grounds a quantitative theory of reinforcement. It assumed that incentives excite behavior and focus the excitement on responses that are contemporaneous in memory. The correlation between the organism's memory and the experimenter's requirements is given by coupling coefficients, derived for various schedules of reinforcement. The coupling coefficients for simple schedules may be concatenated to predict the effects of complex schedules. The coefficients are inserted into a generic model of arousal and temporal constraint to predict response rates under any scheduling arrangement. The theory posits that responses index memory, thus displacing memory for the responses that occur before them. As a contiguity-weighted correlation model, it bridges opposing views of the reinforcement process. By placing the short-term memory of behavior in so central a role, it provides a behavioral account of a key cognitive process.
Quantitative Measurement of Self-Control: Use of the Matching Law With an Adjusting Procedure
Self-control can be defined as choice of a larger, more delayed reinforcer over a smaller, less delayed reinforcer. The generalized matching law has been used very successfully to quantify self-control in a large variety of experiments. The ratio of two free parameters in the generalized matching law, SA/SD, measures a subject's relative sensitivity to variation in reinforcer amount and delay. Values of this ratio greater than 1.0 tend to be associated with self-control, and values of this ratio less than 1.0 tend to be associated with impulsiveness. In situations in which a subject's preference is equal between two alternatives varying in reinforcer amount and delay, assuming that there is no response bias, it can be shown that SA/SD = [log(D1/D2)]/[log(A1/A2)], in which Ai and D1 represent the amounts and delays of the two choice alternatives, respectively. Several experiments have now measured SA/SD in rats, monkeys, and humans using such an analysis along with an adjusting procedure that results in equal preference between the two alternatives. This method can obtain a quantitative measure of self-control with fewer experimental conditions than are required with some other methods.
Development of Preference and Spontaneous Recovery in Choice Behavior
In two experiments, pigeons pecked on two response keys that delivered reinforcers on a variable-interval schedule. The proportion of reinforcers delivered by one key was constant for a few sessions and then changed. In Experiment 1, response proportions approached a new asymptote slightly more slowly when the switch in reinforcement proportions was more extreme. Experiment 2 found slightly faster transitions with higher overall rates of reinforcement. Transition patterns were consistent with a mathematical model that assumes the strength of each response is increased by reinforcement and decreased by nonreinforcement. However neither this model nor other similar models predicted the "spontaneous recovery" observed in later sessions: at the start of these sessions, response proportions reverted toward their levels of previous sessions. Computer simulations could mimic the spontaneous recovery by assuming that subjects store separate representations of response strength for each session, which are averaged at the start of each new session.
Further Considerations on Probability and Delay
There are three basic criteria for a discount function, 1. At one extreme the function should approach no discounting, 2. At the other extreme the function should approach complete discounting, 3. The function should go monotonically (if not continuously) from one extreme to the other. Mazur's delay discount equation satisfies the three criteria. Some probabilistic discount functions based on Mazur's function also satisfy the three criteria; some do not. These various functions will be discussed. Evidence from experiments with nonhuman and human subjects will be presented bearing on both probabilistic and delay discount functions and their interaction.
Peak Deviation Analysis: Quantitative Characterization of DRL Interresponse Time Distribution Profiles
Peak deviation analysis is a quantitative technique for describing IRT distributions. Peak deviation analysis compares each subject's obtained IRT distribution to the corresponding negative exponential distribution that would have occurred if the subject had emitted the same number of responses randomly in time, at the same over all rate. Peak deviation analysis provides three metrics (peak location, peak area, and burst ratio) that characterize the profile of the obtained IRT distribution. Peak deviation analysis uses a peak finding algorithm to locate the largest deviation (peak) of the obtained IRT distribution above the corresponding negative exponential distribution and computes the location and area of the peak. The burst ratio is the total number of obtained burst IRT durations divided by the total number of burst IRT durations predicted to occur by the corresponding negative exponential. Data will be presented demonstrating the application of peak deviation analysis to 1) acquisition of DRL 36-s performance; 2) comparison of performance on DRL 18-s, 36-s and 72-s schedules; and 3) characterization of drug effects. The data show that peak deviation analysis provides information about the behavioral process that underlies DRL performance that would not be apparent from a qualitative examination of IRT histograms.
Counting and Timing Light Flashes by Pigeons
Research will be reported that tests the applicability of the Meck, Church and Gibbon (1985) mode control model to the processing of light flashes in pigeons. In an initial set of experiments, pigeons were trained to discriminate between a sequence of two light flashes that lasted 2 s and a sequence of eight light flashes that lasted 8 s. Subsequent tests showed that this training had established both time and number control. Pigeons also learned to respond accurately to ambiguous flash sequences in which time and number indicated different responses. The use of postsequence cues served to disambiguate these sequences and indicated a modification of the mode control model to allow selective retrieval of time and number information from working memory. In a second set of experiments, the prediction was tested that a "choose-small" effect should be found when memory for small and large numbers of flashes was tested at short and long delays. Evidence for a choose-small effect was found. In addition, these experiments indicated that pigeons do not automatically time sequences of light flashes. Rather, pigeons appear to use counting and/or timing "strategies" that afford the simplest solution to a given learning situation.
The Evolution of Quantitative Approaches in Behavior Analysis
An empirical analysis of the development of quantitative approaches within the experimental analysis of behavior will be based on papers published in The Journal of the Experimental Analysis of Behavior since its inception. Historical trends will be presented in terms of categorical distinctions such as Descriptive versus Rational, Static versus Dynamic, Mathematical versus Computational, and Molecular versus Molar. Frequencies of interpretations of quantitative systems in terms of behavioral, biological, and cognitive vocabularies will also be examined.
Response Rate and Initiation Rate
An occasional, recurring theme over the past 35 years is that response rate is a composite of two modes: bursts (during which the measured response occurs at a fairly high rate and fairly constant tempo) and pauses between bursts. A model based on this two-mode conception shows that the rate of initiating bursts may be very sensitive to classes of operations (e.g., to incentive and motivational variables) even though the usual measure of response rate is not. Burst-initiation rate may correspond better than the composite rate to Skinner's conception of a measure indicating the likelihood of an activity.
Rate-sensitive Habituation and a Possible Mechanism for Reinforcement Learning
Habituation seems to be the outcome of a process by which the integrated sum of recent stimuli suppresses the current response. Habituation depends on stimulus spacing: it occurs more rapidly when interstimulus intervals (ISIs) are short than when they are long -- but also recovers more rapidly after short interstimulus intervals (rate-sensitivity). The effect of ISI on habituation rate is consistent with a simple one-stage process, but the effect of ISI on recovery rate seems to require a serial process in which two or more habituating units are cascaded, with earlier (peripheral) units in the series having shorter time constants than later (central) units. Important phenomena in operant learning, such as the partial-reinforcement and successive-contrast effects, seem to depend on event spacing in the same way as rate-sensitive habituation, suggesting the possibility of a common underlying process.
Maximum Likelihood Estimation of Signal Detection Parameters
In a typical signal detection procedure, subjects must decide whether or not a stimulus recently occurred. Performance on a signal detection task is generally assumed to be jointly determined by stimulus discriminability (d') and bias (beta). Precise estimates of these parameters require an analysis of the ROC curve, which is obtained by varying bias across conditions while holding d' constant. Although ROC plots can be (and often are) analyzed using least squares, maximum likelihood estimation may be more appropriate.
Facilitator: Peter R. Killeen, Arizona State University
A colloquy, or a directed discussion, is scheduled at the end of this year's meeting. The intent is to facilitate progress toward solving, or at least conceptualizing, some of the important issues in the quantitative analyses of behavior. A colloquy will be able to take advantage of the unique opportunity offered by the simultaneous presence of the SQAB attendees. Each attendee is requested to spend some time clarifying their own views on the suggested topics before the meeting.
1. What are the criteria for, and examples of, the "basic questions" for EAB, for instance,
2. How do we redesign experimental design to accommodate path-dependence and sensitivity to initial conditions?