Differences

This shows you the differences between two versions of the page.

wiki:wac_links [2009/11/17 08:10]
whf
wiki:wac_links [2010/03/30 08:03] (current)
whf
Line 4: Line 4:
collaborative repository for data, software and links to Web as Corpus sites set up by Stefan Evert and co-administered by various members of the WaC community collaborative repository for data, software and links to Web as Corpus sites set up by Stefan Evert and co-administered by various members of the WaC community
-===== Web as Corpus Workshops =====+===== Annual Web as Corpus Workshops ===== 
 + 
 +  * WaC 6:  6th Web as Corpus Workshop, in association with  NAACL-HLT in Los Angeles, 5-6 June 2010\\ [[http://www.sigwac.org.uk/wiki/WAC6|Workshop site]] 
  * WaC 5: 5th Web as Corpus Workshop, as part of SEPLN 09, Donostia / San Sebastián, Spain, 7 September 2009\\ [[http://www.sigwac.org.uk/wiki/WAC5|Workshop site]] |  [[http://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf?format=raw|Proceedings]]   * WaC 5: 5th Web as Corpus Workshop, as part of SEPLN 09, Donostia / San Sebastián, Spain, 7 September 2009\\ [[http://www.sigwac.org.uk/wiki/WAC5|Workshop site]] |  [[http://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf?format=raw|Proceedings]]
Line 20: Line 23:
==== Groups ==== ==== Groups ====
-  * [[http://sigwac.org.uk/|ACL SIGWAC]] Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus+  * [[http://sigwac.org.uk/|ACL SIGWAC]] Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus, organizer of the Web as Corpus workshop series 
  * [[http://devel.sslmit.unibo.it/mailman/listinfo/sigwac   * [[http://devel.sslmit.unibo.it/mailman/listinfo/sigwac
|ACL SIGWAC mailing list]] | [[http://liste.sslmit.unibo.it/pipermail/sigwac/| archives]] |ACL SIGWAC mailing list]] | [[http://liste.sslmit.unibo.it/pipermail/sigwac/| archives]]
Line 37: Line 41:
  *[[http://www.kwicfinder.com|KWiCFinder]]  desktop Web concordancer   *[[http://www.kwicfinder.com|KWiCFinder]]  desktop Web concordancer
-  *[[http://lse.umiacs.umd.edu/|Linguist's Search Engine]] search with parser //(temporarily offline)//+  *[[http://lse.umiacs.umd.edu/|Linguist's Search Engine]] search with parser //(temporarily? offline)// 
 +  *[[http://sealang.net/webcorpus/|SouthEast Asian Language Web Corpus]]
  *[[http://webascorpus.org/searchwac.html|WebAsCorpus.org Web Concordancer]] (34 languages)   *[[http://webascorpus.org/searchwac.html|WebAsCorpus.org Web Concordancer]] (34 languages)
  *[[http://www.niederlandistik.fu-berlin.de/cgi-bin/web-conc.cgi?sprache=en&art=google|WebCONC]]   *[[http://www.niederlandistik.fu-berlin.de/cgi-bin/web-conc.cgi?sprache=en&art=google|WebCONC]]
Line 43: Line 48:
-==== Online Web Corpora ====+ 
 +==== Web Corpora Online (direct query) ====
  *[[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]]\\ (English, Chinese, Finnish, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish)   *[[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]]\\ (English, Chinese, Finnish, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish)
Line 49: Line 55:
   
 +==== ESL Sites based on Google's Web 1T Corpus ====
- +  *[[140.114.75.12/linggle|Linggle]] wildcard search for collocates and examples based on Google 1T 2-grams 
 +   
 +  *[[http://flax.nzdl.org/greenstone3/flax?a=p&sa=about&c=phrases|FLAX Web Phrases]]  Described in\\ Wu, S., Witten, I. H. & Franken, M. (2010).  Utilizing lexical data from a web-derived corpus to expand productive collocation knowledge. //ReCALL, 22//(1), 83–102.\\  [[http://flax.nzdl.org/greenstone3/flax?a=p&sa=home&module=|Links to other modules]]
==== Other WaC Projects ==== ==== Other WaC Projects ====
Line 59: Line 67:
===== WaC-related Tools / Software ===== ===== WaC-related Tools / Software =====
- +  *[[http://bootcat.sslmit.unibo.it/|BootCat]]  
 +  *[[http://crawler.archive.org/|Heritrix Crawler (Web Archive)]] 
 +  *[[http://melot.upf.edu/jaguar|Jaguar]] extracts specialized corpora from the web and analyzes various lexical statistics; runs and saves corpora on developer's server 
 +  *[[http://grosmoteur.elizia.net/GrosMoteur|GrosMoteur]] Web concordancer; supports either querying Yahoo! or crawling the Web; cross-platform (Python)
===== Publications ===== ===== Publications =====
Line 68: Line 78:
 +===== Search Engine Links =====
 +
 +  *[[http://altsearchengines.com|AltSearchEngines.com]] reviews specialized and non-English SEs
 +  *[[http://www.Multilingual-Search.com/|multilingual-search.com]] discusses issues and developments in non-English search
 +  *[[http://Abondance.com|abondance.com]] tracks the European search market from a French perspective

Personal Tools