”Digging into Data Challenge” 2009 funds two web-based speech-oriented projects:
Harvesting Speech Datasets for Linguistic Research on the Web
Awardees: Mats Rooth, Cornell University, NSF; Michael Wagner, McGill University, SSHRC. Description: This project will harvest audio and transcribed data from podcasts, news broadcasts, public and educational lectures and other sources to create a massive corpus of speech. Tools will then be developed to analyze the different uses of prosody (rhythm, stress and intonation) within spoken communication.
Mining a Year of Speech
Awardees: Mark Liberman, University of Pennsylvania, NSF; John Coleman, University of Oxford, JISC. Additional Key Participants: The British Library. Description: This project focuses on large scale data analysis of audio – specifically the spoken word. This project will create tools to enable rapid and flexible access to over 9,000 hours of spoken audio files, containing a wide variety of speech, drawn from some of the leading British and American spoken word corpora, allowing for new kinds of linguistic analysis