Web as Corpus Links

collaborative repository for data, software and links to Web as Corpus sites set up by Stefan Evert and co-administered by various members of the WaC community

Annual Web as Corpus Workshops

  • WaC 6: 6th Web as Corpus Workshop, in association with NAACL-HLT in Los Angeles, 5-6 June 2010
    Workshop site
  • WaC 5: 5th Web as Corpus Workshop, as part of SEPLN 09, Donostia / San Sebastián, Spain, 7 September 2009
    Workshop site | Proceedings
  • WaC 4: 4th Web as Corpus Workshop – Can we beat Google?, as part of LREC 2008, Marrakech, Morocco, 1 June 2008
    Conference site | Proceedings
  • WaC 1, Corpus Linguistics conference, Birmingham, UK, July 2005
    Conference site

Web as Corpus sites

Groups

  • ACL SIGWAC Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus, organizer of the Web as Corpus workshop series

Other Wikis

Web as Corpus concordancers

Web Corpora Online (direct query)

ESL Sites based on Google's Web 1T Corpus

  • Linggle wildcard search for collocates and examples based on Google 1T 2-grams

*FLAX Web Phrases Described in
Wu, S., Witten, I. H. & Franken, M. (2010). Utilizing lexical data from a web-derived corpus to expand productive collocation knowledge. ReCALL, 22(1), 83–102.
Links to other modules

Other WaC Projects

WaC-related Tools / Software

  • Jaguar extracts specialized corpora from the web and analyzes various lexical statistics; runs and saves corpora on developer's server
  • GrosMoteur Web concordancer; supports either querying Yahoo! or crawling the Web; cross-platform (Python)

Publications

  • Gatto, Maristella 2009. From Body to Web. An Introduction to the Web as Corpus. Roma - Bari: Laterza University Press Online.
    pre-publication light version (6.5 MB, no registration required) | definitive version (51 MB, requires registration)

Search Engine Links


Personal Tools