Title: Auditory Stimulus Reconstruction from High-Resolution fMRI of Music Perception and Imagery
Speaker: Michael A. Casey, James Wright Professor of Music and Professor of Computer Science, Dartmouth College
Time/Date: 11am on Wednesday, 11 March 2020
Location: Salle Stravinsky, IRCAM, 1 place Igor-Stravinsky, 75004 Paris
Abstract: Recent research shows that visual stimulus features corresponding to subjects’ perception of images and movies can be predicted and reconstructed from fMRI via stimulus-encoding models. We present the first evidence in the auditory domain that listeners were able to positively discriminate between stimulus-model reconstructions and null-model reconstructions of target-audio stimuli from fMRI images, cross-validated by stimulus. We collected high resolution (7 Tesla) fMRI data for 20 subjects listening to 25 music clips in different genres. We trained non-linear stimulus-encoding models to predict fMRI voxel activations from acoustic features used in the music listening task, and ‘inverted’ the encoding models to synthesize audio reconstructions from fMRI with novel stimuli. We conducted forced-choice listening tests with 250 stimulus-encoding-model and null-encoding-model reconstructions compared to the cross-validation stimuli (baseline=50%, acc=60.8% p=0.00038). The listening tests provided a robust framework for mapping hypotheses about cognitive representations of auditory perception.
We model fMRI responses to auditory stimulus features using a multivariate pattern analysis (MVPA) representation—with dimensions corresponding to voxel locations and values corresponding to voxel activations in cortical regions of interest. Auditory stimulus features representing harmony and timbre were used as predictor variables and fMRI activations used as responses, with models predicting voxel activation patterns corresponding to the stimulus features. Subsequently, response patterns to a large corpus of novel audio clips were predicted using the trained stimulus-encoding models, creating a dataset of predicted fMRI priors and their corresponding audio clips. We retrieved nearest neighbour images from the corpus of priors (predicted brain images) given a target fMRI image corresponding to a novel stimulus, and selected corresponding prior audio waveforms for the stimulus reconstruction. Stimuli were reconstructed by blending and concatenating these short (2s) prior audio clips in a form of concatenative synthesis, and these reconstructions were used for the listening tests. The code, stimuli, and high-resolution fMRI data have been publicly released via the OpenfMRI initiative, to encourage further development of methods for probing sensory perception and cognition,
Biography: Michael Casey is Professor of Music and Computer Science at Dartmouth College. He received his Ph.D. from the MIT Media Lab “Machine Listening Group” in 1998, subsequently taking up posts as Research Scientist at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, USA, and as Professor of Computer Science at the University of London, Goldsmiths College, before joining Dartmouth’s faculty in 2008. He was a co-developer and co-editor of the MPEG-4 and MPEG-7 international standards for audio encoding and content description, and made substantial contributions to approximate nearest neighbour methods for Internet-scale audio search, in collaboration with Yahoo! Inc. and Google Inc. For the last 10 years his research has explored the use of machine learning with audio and neuroimaging to map cognitive neural representations of sensory experiences. He has received support for his work from the National Science Foundation (NSF), the Mellon Foundation, the Neukom Institute for Computational Science, Yahoo! Inc., Google Inc., the Engineering and Physical Sciences Research Council (EPSRC) UK, and the National Endowment for the Humanities (NEH).
The following publications include a publicly available dataset of fMRI and physiological data (cardiac and respiratory traces) for 20 subjects in an attentive music listening task:
Michael Casey, Music of the 7Ts: Predicting and Decoding Multivoxel fMRI Responses with Acoustic, Schematic, and Categorical Music Features, Frontiers in Psychology: Cognition, Research Topic Bridging Music Informatics with Music Cognition, 28 Jun, 2017.
Michael Hanke, Richard Dinga, Christian Häusler, J. Swaroop Guntupalli, Michael Casey, Falko R. Kaule, Jörg Stadler, High-resolution 7-Tesla fMRI data on the perception of musical genres, F1000 Research, F1000, June, 2015
Beau Sievers, Larry Polansky, Michael Casey, Thalia Wheatley, Music and movement share a dynamic structure that supports universal expressions of emotion, Proceedings of the National Academy of Sciences, Jan 2013, 110 (1) 70-75
Casey M., Thompson J., Kang O., Raizada R., Wheatley T. Population Codes Representing Musical Timbre for High-Level fMRI Categorization of Music Genres. In: Langs G., Rish I., Grosse-Wentrup M., Murphy B. (eds) Machine Learning and Interpretation in Neuroimaging. Lecture Notes in Computer Science, vol 7263, 2012, Springer, Berlin, Heidelberg.