Paradigm
2AFC minimises criterion effects. Yes/No allows SDT d′ analysis.
Stimulus
Stimulus parameters
Timing
Adaptive engine
Protocol
Response keys
Quick presets
Run control
Press Esc to abort. Fullscreen achieves best timing accuracy.
Keyboard shortcuts
Arrow keys respond  |  Esc abort  |  F11 fullscreen
Experiment status
DPR: 1 Idle
Trial: 0/0 Block: 0 Stair:
Stimulus stage
High-DPR canvas for timing-accurate stimulus presentation. Click canvas to focus for keyboard responses.
Live statistics
Mean RT
Accuracy
Threshold
Reversals
d′
Criterion
Staircase trace
Gold line = intensity; red dots = reversals; blue dashed = threshold estimate.
Psychometric function
Gold dots = binned data (size ∝ n); blue curve = fitted logistic.
RT distribution
20-bin histogram of response latencies.
SDT response matrix
Response matrix appears here
Hit / Miss / False Alarm / Correct Rejection counts — used for d′ and criterion computation.
Recent trials

            
Actions
Session export

JSON includes full metadata, per-trial responses, and summary statistics. CSV is flat trial-by-trial for R/Python/MATLAB. PNG exports the staircase chart.

Custom presets
No saved presets
Share configuration

Generates a URL that restores all control values when opened.

Session comparison summary
Run an experiment, then generate a summary here.
Session management

Clears all trial data, staircase state, and charts. Cannot be undone.

Psychophysical Experiment Standards

Standards governing stimulus presentation, measurement protocols, and analysis methods in psychophysics research. This engine follows the principles below; calibrated hardware is required for publication-grade compliance.

CIE Standards for Visual Threshold Measurement

CIE 017:2023 — International Lighting Vocabulary. Defines luminance threshold, contrast threshold, adaptation luminance, and related photometric quantities used in psychophysical measurement. Supersedes CIE S 017/E:2011.

CIE 041-1978 — Light as a true visual quantity. Establishes principles for photometric measurements consistent with human visual response, including spectral luminous efficiency functions V(λ) (photopic) and V′(λ) (scotopic).

CIE 159:2004 — CIECAM02 colour appearance model. Provides the chromatic adaptation and appearance framework underlying colour-appearance psychophysics. Used when experiments cross viewing conditions.

CIE 224:2017 — Colour fidelity index for accurate scientific use (R_f). Defines how faithfully a light source renders test colours, critical for colour naming and discrimination experiments.

CIE S 026:2018 — α-opic action spectra and toolbox for measuring ipRGC-influenced responses. Required for flicker and temporal sensitivity experiments involving melanopsin pathways.

ISO Standards for Psychophysical Testing

ISO 3664:2009 — Viewing conditions for graphic technology and photography. Specifies D50 and D65 reference illuminants, luminance levels (500–2 000 cd/m²), and surround conditions for consistent psychophysical evaluation.

ISO 12646:2015 — Displays for colour proofing. Specifies white point (D50 ±3 MIRED), luminance (≥80 cd/m²), uniformity (≤10% variation), and colour gamut for visual colour-matching experiments.

ISO 11664-4:2019 — CIE 1976 L*a*b* and CIEDE2000 (ΔE00). The standard metric for perceptual colour differences in discrimination experiments. Threshold for just-noticeable difference: ΔE00 ≈ 1.0–2.3 under controlled conditions.

ISO 14253-1:2017 — Test uncertainty for measurement conformance. Sets the framework for reporting measurement uncertainty, applicable when psychophysical thresholds inform acceptance criteria.

ISO/IEC 29341 (UPnP AV) — Display compliance for video content; relevant for flicker sensitivity and temporal contrast measurements.

Signal Detection Theory (SDT) Framework

Equal-variance Gaussian SDT (Green & Swets, 1966) assumes signal and noise distributions are unit-normal with equal variance. Sensitivity d′ = z(H) − z(FA). Criterion c = −½[z(H) + z(FA)]. Likelihood ratio β = exp(−½[z(H)² − z(FA)²]).

Unequal-variance SDT relaxes the s = 1 assumption. Necessary when perceptual variance differs between signal and noise (e.g., recognition memory, familiarity paradigms). ROC slope ≠ 1 is diagnostic.

Non-parametric index A′ provides a threshold-free sensitivity estimate when distributional assumptions cannot be verified. A′ = 0.5 (chance) to 1.0 (perfect).

Multi-interval SDT: in m-AFC, d′m = d′ · √(m / 2(m−1)) for equal-variance Gaussian model. Corrects for interval count.

ROC analysis plots H vs FA across all criterion levels. AUROC = P(X_S > X_N) = Φ(d′/√2) under equal-variance Gaussian model. Provides non-parametric sensitivity measure independent of observer criterion.

Hautus corrections (1995) for extreme rates: apply 0.5/N and 1−0.5/N corrections to avoid infinite z-scores at H = 0/1 or FA = 0/1.

Adaptive Procedure Standards & Terminology

Classical threshold (Fechner, 1860): absolute threshold = lowest stimulus intensity detected on 50% (or 75%) of presentations; difference threshold = smallest detectable change (ΔI/I = Weber fraction).

Transformed up-down rules (Levitt, 1971): step up after 1 wrong, step down after N correct in a row. Convergence point p satisfies p^N = 0.5.

  • 1-up / 1-down: p = 50.0%
  • 1-up / 2-down: p = 70.7%
  • 1-up / 3-down: p = 79.4%
  • 1-up / 4-down: p = 84.1%

QUEST (Watson & Pelli, 1983): Bayesian posterior on a grid of θ values. Next stimulus = posterior mean or MAP. Updates analytically via Bayes’ rule. Achieves 95% of maximum efficiency after ∼30 trials.

PEST (Taylor & Creelman, 1967): parameter estimation by sequential testing. Step doubles in the same direction; halves on reversal. Uses Wald’s sequential probability ratio test to determine termination.

Constant stimuli: fixed intensity levels presented in random order. No adaptive rule. Requires ≥100 trials per level; most accurate for psychometric curve shape but least efficient.

Minimum trial recommendations: QUEST ≥40; transformed staircase ≥60–80; constant stimuli ≥100 per level. Bootstrap CI with ≥1 000 resamples for threshold confidence intervals.

Display & Timing Requirements

Refresh rate: 60 Hz provides 16.67 ms frame resolution; suitable for stimuli ≥50 ms. 120 Hz (8.33 ms) or 240 Hz (4.17 ms) for temporal contrast & flicker work. CRT phosphor decay <1 ms was the historical gold standard.

Luminance calibration: Use a photometer (e.g., Konica Minolta LS-100) to verify display gamma. sRGB assumes γ = 2.2 but individual panels deviate. Incorrect gamma introduces non-linear contrast errors of up to ±30%.

Bit depth: 8-bit: 256 grey levels; insufficient for <5% Michelson contrast. 10-bit: 1 024 levels; acceptable for most CSF work. 12-bit: 4 096 levels; required for sub-threshold masking experiments. DVI/HDMI 8-bit typically limits low-contrast Gabors even on 10-bit panels.

Spatial calibration: Viewing distance and display pixel pitch determine cycles per degree. At 57 cm, 1° ≈ 1 cm. Typical 24-inch 1080p at 57 cm: ∼37 pixels per degree. Set spatial frequency fields accordingly.

Temporal timing (web): This engine uses rAF + performance.now() sub-millisecond timestamps. Browser compositing adds ∼1 frame latency. Tab-switching or background processes degrade timing. For publication, use PsychoPy, PsychToolbox, or jsPsych with hardware-verified onset timestamps.

Viewing conditions (ISO 3664): Surround luminance 10–15% of display white; room illuminance ≤64 lx; no specular reflections. Maintain ≥20 min dark adaptation for scotopic threshold work.

Experimental Design & Ethics

Observer blinding: Where possible use a blinded operator and randomised trial order to minimise experimenter and observer expectation bias.

Practice trials: Always include practice (minimum 10 trials) before data collection. Practice trials are not analysed. This engine’s Practice mode supports this.

Breaks: Recommend a 5-minute break every 15–20 minutes of continuous testing. Eye fatigue affects thresholds by up to 0.1 log units.

Ethics: Photosensitive epilepsy risk is present with flicker stimuli at 3–30 Hz. Do not use flicker paradigms without informed consent screening. This tool includes a flicker stimulus type; use responsibly.

Reporting standards: Report mean threshold ± bootstrap 95% CI, staircase type, number of reversals, reversals discarded, trials per block, viewing distance, and display specifications (make, model, refresh rate, bit depth, luminance).

Formulas & Mathematics

All formulas implemented in psychophysical-experiment.js. Notation follows standard psychophysics convention (Green & Swets 1966; Wichmann & Hill 2001; Watson & Pelli 1983).

Psychometric Functions
Weibull Psychometric Function (King-Smith & Rose, 1997):
ψ(x; α, β, γ, δ) = γ + (1 − γ − δ) · [1 − exp(−(x/α)β)]

α = threshold (62% point after correction)
β = shape / slope (typically 2–4 for detection)
γ = guessing rate (0.5 for 2AFC; 1/m for m-AFC; ~0 for yes/no)
δ = lapse rate (typically 0.01–0.04; models inattention)
At α: ψ = γ + (1−γ−δ)(1−e−1) ≈ 63.2% uncorrected

Logistic Psychometric Function (standard in this engine):
ψ(x; θ, s) = γ + (1−γ−δ) / [1 + exp(−s(x − θ))]

θ = threshold (50% point of the sigmoid, corrected for γ/δ)
s = logistic slope ≈ 3.5 / (α · σ) where σ is the psychometric width

Probit (Normal) Function:
ψ(x) = Φ(μ + σx) — used when distributional assumptions are justified

Maximum Likelihood Estimation (MLE):
log L(θ, s | data) = ∑i [ki log ψ(xi) + (ni−ki) log(1−ψ(xi))]
Grid search over θ × s (31×21 = 651 evaluations in this engine).

Goodness of fit:
G2 = 2 ∑ [k ln(k/&hat;k) + (n−k) ln((n−k)/(n−&hat;k))]
χ2 with df = (bins − 2) parameters.
Staircase Convergence & Threshold Estimation
Transformed Up-Down Convergence (Levitt, 1971):
Target performance p* = 0.51/Ndown

1-up / 2-down: p* = √0.5 ≈ 70.71%
1-up / 3-down: p* = 0.51/3 ≈ 79.37%
1-up / 4-down: p* = 0.51/4 ≈ 84.09%

Threshold Estimation from Reversals (Levitt, 1971):
θ̂ = (1/M) ∑i=k+1N xrev,i
Discard first k = 4 warmup reversals; use M = N−4 stable reversals.
SE = SD(xrev) / √M
Need ≥6 total reversals (≥2 stable) for a valid estimate.

Geometric Step Reduction (this engine):
sn = s0 × (1/√2)floor(R/4)
R = number of reversals; reduces step by ~29% every 4 reversals.

Precision & efficiency:
Variance of threshold estimate ∝ (step size)2 + observer variability
Smaller final step size → lower variance but requires more trials.
QUEST — Bayesian Adaptive Estimation
Likelihood function (Weibull psi in QUEST):
ψ(x | θ) = γ + (1−γ−δ) [1 − exp(−10β(x−θ))]

Bayesian update:
p(θ | data) ∝ p(θ) · ∏i ψ(xi|θ)ri (1−ψ)1−ri
Normalised: pn+1(θ) = pn(θ) · L / Z, where Z = ∑ p(θ)L(θ)

Next stimulus selection:
MAP: x* = argmaxθ p(θ | data)
Posterior mean (this engine): x* = E[θ | data] = ∑ θi p(θi | data)
Posterior SD: σpost = √E[(θ−μ)2] — decreases with trials

QUEST+ entropy minimisation (Watson 2017):
H[p] = −∑ p(θ) log2 p(θ) (bits)
Choose x* = argminx Er[H[p(θ|data,x,r)]]
Simultaneously estimates α, β, δ with ≥3D grid.

Grid implementation (this engine):
300-point uniform grid on [minInt, maxInt]
γ = 0.5 (2AFC), 0.01 (yes/no); β = 3.5; δ = 0.02
Posterior mean used as next intensity (more stable than MAP).
PEST — Parameter Estimation by Sequential Testing
PEST step update rules (Taylor & Creelman, 1967):
W = Wald’s SPRT statistic
Same direction k consecutive steps: s ← min(2s, smax) every 4 steps
Reversal: s ← max(s/2, smin)

This engine’s PEST implementation:
newDir = correct ? −1 : +1
Reversal ⇒ pestStep ×= 0.5; pestConsecutive = 0
Same dir ⇒ pestConsecutive++; if pestConsecutive ≥ 4: pestStep ×= 2
x ← clamp(x + newDir × pestStep, minInt, maxInt)

Convergence target: 50% correct (yes/no) or 75% (2AFC with appropriate prior)

Efficiency: PEST typically reaches a stable estimate in 20–40 trials; similar efficiency to QUEST for single-parameter estimation.
Signal Detection Theory Metrics
Core SDT formulas:
d′ = z(H) − z(FA)     [sensitivity; 0 = chance, 4.65 = p(err)<0.001%]
c = −½[z(H) + z(FA)]     [criterion; 0 = unbiased]
β = exp(−½[z(H)2 − z(FA)2])     [likelihood ratio]
A′ = ½ + sign(H−FA)|H−FA|2 / (4 max(H,FA)(1−min(H,FA)))

Hautus (1995) corrections for extreme rates:
H = 1 ⇒ H′ = 1 − 1/(2NS)
FA = 0 ⇒ FA′ = 1/(2NN)

m-AFC correction (Hacker & Ratcliff, 1979):
d′m = d′2AFC × √2 × f(m)
f(m): integration of multivariate normal CDF

Approximate d′ ↔ percent correct (2AFC):
PC = Φ(d′ / √2)
d′ = 0 → 50%; d′ = 1 → 76%; d′ = 2 → 92%; d′ = 3 → 98.6%

ROC area (AUROC):
A = Φ(d′ / √2)  (equal-variance Gaussian model)
Trapezoidal rule for empirical ROC from rated/multiple-criterion data.
Contrast, Sensitivity & Stimulus Metrics
Michelson Contrast (for sinusoidal gratings & Gabors):
CM = (Lmax − Lmin) / (Lmax + Lmin)
Range [0, 1]; C = amplitude / mean luminance for sinusoidal gratings

Weber Contrast (for uniform patches or spots on backgrounds):
CW = ΔL / Lb = (Ltarget − Lbackground) / Lbackground

RMS Contrast:
CRMS = √(mean[(L(x,y) / ⟨L⟩ − 1)2])
Used for broadband stimuli (noise, natural images)

Gabor stimulus (this engine):
G(x,y) = ½ + ½C · cos(2πf(x cosθ + y sinθ)) · exp(−(x2+y2)/(2σ2))
f = spatial frequency (cpd converted to cycles/px at runtime)
σ = Gaussian envelope σ = stimSize/4

Contrast Sensitivity Function (Mannos & Sakrison, 1974):
CSF(f) = a · fc · exp(−b · f); a=2.6, b=0.0192, c=1.1
CS(f) = 1/Cthreshold(f); peak ≈ 3–5 cpd

Cycles per degree conversion:
cpd = cycles/px × (px/deg); px/deg ≈ viewing_distance_mm / pixel_pitch_mm
Bootstrap Confidence Intervals
Parametric bootstrap procedure (Wichmann & Hill, 2001):
1. Fit psychometric function to observed data → obtain θ̂, ŝ
2. For b = 1…B:
   a. Resample data with replacement (non-parametric) or simulate from ψ(x;θ̂)
   b. Re-fit log-logistic MLE → θ̂b
3. 95% CI: [percentile2.5(θ̂b), percentile97.5(θ̂b)]

This engine uses non-parametric resampling (B = 1000 by default) with MLE fit per resample.

BCa (bias-corrected accelerated) CI (Efron, 1987):
More accurate than percentile CI when distribution is skewed.
z0 = Φ−1(#{θ̂b < θ̂} / B)
â = (1/6) ∑(Ui−&bar;U)3 / (∑(Ui−&bar;U)2)3/2
Recommended B ≥ 2000 for BCa; B = 1000 sufficient for percentile CI.
Reaction Time Analysis
RT definition (this engine): time from response-window open (after stimulus offset) to keypress. Does not include stimulus duration.

Ex-Gaussian fit (Luce, 1986): RT distribution often modelled as convolution of Gaussian and exponential:
f(t) = (λ/2) exp[(λ/2)(2μ + λσ2 − 2t)] × erfc[(μ + λσ2 − t)/(√2 σ)]
μ = Gaussian mean, σ = Gaussian SD, λ = exponential rate

Race model / Poffenberger:
Mean RT = sensory latency + motor latency + decision time
Speed-accuracy trade-off (SAT): d′ ∝ √Tobservation under diffusion models

Outlier trimming: standard practice is to exclude RT < 150 ms (anticipations) and RT > 3 SD above mean or > 3000 ms (lapses). This engine filters RT < 0 or > 5000 ms.
References & Citations

Canonical literature underlying the methods implemented in this engine. Sorted by topic; numbered for cross-reference with the Formulas and Standards tabs.

Foundational Texts

[1] Fechner, G.T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf & Härtel. Founding treatise; defines absolute and difference thresholds, method of limits, method of adjustment, and method of constant stimuli.

[2] Green, D.M. & Swets, J.A. (1966). Signal Detection Theory and Psychophysics. New York: Wiley. Definitive treatment of SDT, d′, ROC, and criteria bias; republished 1974 by Krieger.

[3] Luce, R.D. (1986). Response Times: Their Role in Inferring Elementary Mental Organisation. Oxford University Press. RT models, ex-Gaussian distributions, diffusion models.

[4] Kingdom, F.A.A. & Prins, N. (2010). Psychophysics: A Practical Introduction. London: Academic Press. Most accessible modern textbook; covers adaptive methods, fitting, and SDT for practitioners.

Adaptive Staircase Methods

[5] Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49(2B), 467–477. doi:10.1121/1.1912375
→ Derivation of convergence points for m-down/1-up rules; reversal averaging; discard-first-N protocol implemented in this engine.

[6] Watson, A.B. & Pelli, D.G. (1983). QUEST: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33(2), 113–120. doi:10.3758/BF03202828
→ Full QUEST algorithm; prior/posterior update; MAP/mean selection; efficiency analysis vs. constant stimuli.

[7] Taylor, M.M. & Creelman, C.D. (1967). PEST: Efficient estimates on probability functions. Journal of the Acoustical Society of America, 41(4A), 782–787. doi:10.1121/1.1910407
→ PEST step-doubling and halving rules; Wald SPRT termination criterion; implemented with geometric step bounds in this engine.

[8] Watson, A.B. (2017). QUEST+: A general multidimensional Bayesian adaptive psychometric method. Journal of Vision, 17(3), 10. doi:10.1167/17.3.10
→ Extension of QUEST to estimate α, β, δ jointly; entropy-minimisation stimulus selection. Basis for future QUEST+ integration.

[9] García-Pérez, M.A. (1998). Forced-choice staircases with fixed step sizes: Asymptotic and small-sample properties. Vision Research, 38(12), 1861–1881. doi:10.1016/S0042-6989(97)00340-4
→ Bias in fixed-step staircases; argues for variable-step methods and large-reversal discard; supports ≥40 trials recommendation.

Psychometric Function Fitting

[10] Wichmann, F.A. & Hill, N.J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics, 63(8), 1293–1313. doi:10.3758/BF03194544
→ MLE fitting of Weibull/logistic; bootstrap CI (parametric and non-parametric); goodness-of-fit G²; lapse-rate handling. This engine implements MLE grid search and non-parametric bootstrap.

[11] Wichmann, F.A. & Hill, N.J. (2001). The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & Psychophysics, 63(8), 1314–1329. doi:10.3758/BF03194545
→ Systematic evaluation of bootstrap CI validity; minimum sample sizes; bias-corrected intervals.

[12] Prins, N. & Kingdom, F.A.A. (2018). Applying the model-comparison approach to test specific research hypotheses in psychophysical research using the Palamedes toolbox. Frontiers in Psychology, 9, 1250. doi:10.3389/fpsyg.2018.01250
→ Palamedes MATLAB/Python toolbox; model comparison with AIC/BIC; recommended for formal analysis after data collection with this engine.

[13] King-Smith, P.E. & Rose, D. (1997). Principles of an adaptive method for measuring the slope of the psychometric function. Vision Research, 37(12), 1595–1604. doi:10.1016/S0042-6989(96)00310-0
→ Adaptive slope estimation; importance of β for characterising threshold steepness beyond the 50%/75% point estimation.

Signal Detection Theory

[14] Macmillan, N.A. & Creelman, C.D. (2005). Detection Theory: A User’s Guide (2nd ed.). London: Lawrence Erlbaum Associates.
→ Comprehensive SDT reference covering all paradigms; unequal variance, A′, rating ROC, m-AFC corrections. Chapter 4 covers 2AFC matched to this engine.

[15] Hautus, M.J. (1995). Corrections for extreme proportions and their effect on estimated values of d′. Behavior Research Methods, Instruments & Computers, 27(1), 46–51. doi:10.3758/BF03203619
→ 0.5/N correction for H = 1 and FA = 0; implemented in calcSDT() in this engine.

[16] Hacker, M.J. & Ratcliff, R. (1979). A revised table of d′ for M-alternative forced choice. Perception & Psychophysics, 26(2), 168–170. doi:10.3758/BF03208311
→ Tabulated d′ values for m-AFC paradigms; correction factors for mAFC experiments.

Contrast Sensitivity & Spatial Vision

[17] Campbell, F.W. & Robson, J.G. (1968). Application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197(3), 551–566. doi:10.1113/jphysiol.1968.sp008574
→ Original CSF paper; demonstrates the visual system as a multi-channel spatial frequency filter; basis for Gabor stimulus design and CSF sweep preset.

[18] Mannos, J.L. & Sakrison, D.J. (1974). The effects of a visual fidelity criterion on the encoding of images. IEEE Transactions on Information Theory, 20(4), 525–536. doi:10.1109/TIT.1974.1055250
→ Analytical CSF formula CS(f) = a·f^c·exp(−b·f); parameter values: a=2.6, b=0.0192, c=1.1 used in CSF sweep preset.

[19] Watson, A.B. & Ahumada, A.J. (2005). A standard model for foveal detection of spatial contrast. Journal of Vision, 5(9), 717–740. doi:10.1167/5.9.6
→ Unified foveal detection model incorporating optics, neural noise, uncertainty, and CSF; provides normative reference for Gabor detection thresholds.

[20] De Valois, R.L. & De Valois, K.K. (1988). Spatial Vision. Oxford University Press.
→ Comprehensive review of spatial frequency selectivity in V1 complex/simple cells; theoretical basis for grating orientation experiments.

Temporal Sensitivity & Flicker

[21] Kelly, D.H. (1961). Visual responses to time-dependent stimuli. I. Amplitude sensitivity measurements. Journal of the Optical Society of America, 51(4), 422–429. doi:10.1364/JOSA.51.000422
→ dTVF and the temporal CSF; sensitivity peaks at 8–16 Hz; basis for 8 Hz flicker parameter in this engine’s flicker stimulus.

[22] de Lange, H. (1958). Research into the dynamic nature of the human fovea-cortex systems with intermittent and modulated light. I. Attenuation characteristics with white and coloured light. Journal of the Optical Society of America, 48(11), 777–784. doi:10.1364/JOSA.48.000777
→ de Lange curves; critical fusion frequency (CFF) ≈ 50–60 Hz at high luminance; temporal contrast sensitivity function model.

Software & Toolboxes

[23] Brainard, D.H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436. doi:10.1163/156856897X00357
→ Psychtoolbox (MATLAB) — hardware-synchronised timing; recommended for publication-grade data. Used as gold standard to validate timing in this engine.

[24] Pelli, D.G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10(4), 437–442. doi:10.1163/156856897X00366
→ VideoToolbox; CLUT manipulations for 10-bit contrast; historical reference for canvas-based contrast generation in this engine.

[25] Peirce, J.W. (2007). PsychoPy — Psychophysics software in Python. Journal of Neuroscience Methods, 162(1–2), 8–13. doi:10.1016/j.jneumeth.2006.11.017
→ PsychoPy — open-source Python alternative to Psychtoolbox; validated timing, hardware integration; recommended for follow-up laboratory replication of experiments prototyped in this engine.

[26] Prins, N. (2012). The psychometric function: The lapse rate revisited. Journal of Vision, 12(6), 25. doi:10.1167/12.6.25
→ Argues δ should be estimated jointly with α and β via MLE; uninformed δ fixing leads to threshold bias at low PCs.

About this engine

This tool implements methods from references [5–16] in a zero-network, client-side JavaScript engine. It is designed for rapid prototyping, educational demonstrations, and pilot data collection. It is not a substitute for calibrated laboratory equipment or validated software (PsychoPy [25], Psychtoolbox [23–24], Palamedes [12]). When reporting results from this tool, cite the underlying methods ([5], [6], [10], [14]) and disclose participant viewing conditions and display specifications.

Research & Visualisation

Advanced analysis tools, batch processing, threshold distributions, and research guidance for interpreting psychophysical data. Run an experiment in the Lab tab first.

Bootstrap Threshold Distribution 0 trials in session

Non-parametric bootstrap (Wichmann & Hill, 2001): resamples trial data B times, fits MLE logistic per resample, plots the distribution of θ̂ estimates. 95% CI = [2.5th, 97.5th] percentiles of bootstrap distribution.

Run an experiment, then click Run Bootstrap.
Gold bars = bootstrap θ̂ histogram; red dashed lines = 95% CI bounds; requires ≥10 trials.
Interpreting Your Results

Staircase trace (Lab tab): Gold line = trial-by-trial intensity. Hollow red circles = warmup reversals (first 4, discarded). Filled red circles = stable reversals (used for threshold). Blue dashed = threshold estimate. Expect an oscillating pattern converging on the threshold region. If the trace hits min or max intensity repeatedly, adjust initial intensity or step size.

Psychometric function (Lab tab): Gold dots = binned proportion correct (size ∝ √n trials in bin). Blue curve = MLE logistic fit. Threshold θ is where the curve crosses ~75% for 2AFC. A steep slope (high s) indicates sharp threshold; a shallow slope suggests variability or criterion inconsistency.

RT histogram (Lab tab): Green bars = correct trials; red bars = error trials. Gold dashed = mean RT. Correct-trial RT should be shorter than error RT for difficult stimuli (speed-accuracy consistency check). Very fast responses (<150 ms) may be anticipations; very slow (>2000 ms) may be lapses.

d′ interpretation: d′ = 0 is chance; d′ = 1 is ~76% 2AFC accuracy; d′ = 2 is ~92%. For yes/no, d′ ≈ c = 0 means unbiased responding. Positive c = conservative; negative c = liberal criterion placement.

Bootstrap CI width: Wide CI (>0.2 in normalised intensity) indicates insufficient trials or high within-session variability. Aim for CI width <0.1. Run more trials or increase B to 2000–5000 for publication.

Normative Reference Values
Luminance contrast (Gabor, 4 cpd, 200 ms): Michelson threshold ≈ 0.5–2% (CS ≈ 50–200) — young adults
At 60+ years: contrast sensitivity reduced ≈ 0.5–1 log unit [Watson & Ahumada 2005]

Critical fusion frequency (flicker): CFF at 500 cd/m² ≈ 55–60 Hz (Granit-Harper law)
CFF at 20 cd/m² ≈ 35–40 Hz

Orientation discrimination: Just-noticeable difference ≈ 1–3° at 90° (cardinal); 5–7° at 45° (oblique effect)

Weber fraction (luminance increment): ΔI/I ≈ 0.01–0.02 in photopic range (Weber’s law)

Typical 2AFC threshold accuracy: At 70.7% (1-up/2-down): ~30–50 trials to stabilise within ±0.5 SD
QUEST posterior SD < 0.1 log units after ∼30 trials

Reaction times (detection tasks, young adults): Simple RT: 180–250 ms | Choice RT: 250–400 ms | Discrimination: 300–600 ms
Batch Session Analysis

Paste JSON trial data from the Export JSON button (Actions tab) — the trials array. Computes accuracy, mean RT, d′, criterion, β, and MLE threshold/slope from the pasted data.

Click Run Analysis to process session data.
Method Comparison: Staircase vs QUEST vs PEST
Property 1-up/2-down 3-down/1-up QUEST PEST Constant
Convergence 70.7% 79.4% Any (via prior) 50% (no-correct bias) All levels
Min trials ≥60 ≥60 ≥30–40 ≥20–40 ≥100/level
Estimates slope? No No With QUEST+ No Yes (MLE)
Robust to lapses? Moderate Low High (δ prior) Moderate High (MLE δ)
Best for Threshold tracking High accuracy Fast, Bayesian Speed Full curve shape
Recommended Experimental Workflow
  1. Pilot: Run 20–30 practice trials (Practice button) to verify response key configuration, timing, and initial intensity range. Adjust Initial intensity so practice accuracy is 60–80%.
  2. Choose method: For rapid threshold estimate use QUEST (30–50 trials). For reproducibility and conventional reporting use 1-up/2-down with 60–80 trials and 12 reversals (discard first 4). For full psychometric curve use Constant with ≥5 levels ×≥20 trials.
  3. Data collection: Encourage the observer to respond quickly but accurately. Remind them to press practice keys before the main run. Offer breaks every 15–20 minutes.
  4. Check staircase trace: Confirm the trace oscillates (reversals visible) and has not saturated at min/max intensity >50% of trials. If saturated, abort and adjust initial intensity & step size.
  5. Export: Actions tab → Export JSON (full metadata) and Export CSV (for R/Python/MATLAB). JSON includes MLE fit, reversal statistics, and session metadata.
  6. Bootstrap CI: Research tab → Run Bootstrap with B = 1000. Report mean threshold ± 95% CI. Aim for CI < 0.1 log units for publication.
  7. Batch analysis: For multi-session data, paste each session’s trials array into Batch Analysis; compare MLE thresholds and slopes across conditions. Use the Session Comparison (Actions tab) for quick within-session summary.
  8. Replicate in lab software: Use PsychoPy or Psychtoolbox to replicate key findings with hardware-verified timing, calibrated display output, and chin-rest-controlled viewing distance.
Engine spec: 5 adaptive methods (1u2d, 3d1u, QUEST, PEST, constant) · 6 stimulus types (Gabor, grating, blob, patch, noise, flicker) · MLE logistic fit (31×21 grid) · SDT d′ / c / β / A′ · Bootstrap CI (non-parametric, B = 1 000) · RT histograms · JSON/CSV/PNG export · Zero network — all computation on-device (WebCrypto-safe, no telemetry).