Differences
This shows you the differences between two versions of the page.
|
wiki:wac_links [2009/11/17 08:10] whf |
wiki:wac_links [2010/03/30 08:03] (current) whf |
||
|---|---|---|---|
| Line 4: | Line 4: | ||
| collaborative repository for data, software and links to Web as Corpus sites set up by Stefan Evert and co-administered by various members of the WaC community | collaborative repository for data, software and links to Web as Corpus sites set up by Stefan Evert and co-administered by various members of the WaC community | ||
| - | ===== Web as Corpus Workshops ===== | + | ===== Annual Web as Corpus Workshops ===== |
| + | |||
| + | * WaC 6: 6th Web as Corpus Workshop, in association with NAACL-HLT in Los Angeles, 5-6 June 2010\\ [[http://www.sigwac.org.uk/wiki/WAC6|Workshop site]] | ||
| * WaC 5: 5th Web as Corpus Workshop, as part of SEPLN 09, Donostia / San Sebastián, Spain, 7 September 2009\\ [[http://www.sigwac.org.uk/wiki/WAC5|Workshop site]] | [[http://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf?format=raw|Proceedings]] | * WaC 5: 5th Web as Corpus Workshop, as part of SEPLN 09, Donostia / San Sebastián, Spain, 7 September 2009\\ [[http://www.sigwac.org.uk/wiki/WAC5|Workshop site]] | [[http://www.sigwac.org.uk/attachment/wiki/WAC5/WAC5_proceedings.pdf?format=raw|Proceedings]] | ||
| Line 20: | Line 23: | ||
| ==== Groups ==== | ==== Groups ==== | ||
| - | * [[http://sigwac.org.uk/|ACL SIGWAC]] Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus | + | * [[http://sigwac.org.uk/|ACL SIGWAC]] Special Interest Group of the Association for Computational Linguistics (ACL) on Web as Corpus, organizer of the Web as Corpus workshop series |
| * [[http://devel.sslmit.unibo.it/mailman/listinfo/sigwac | * [[http://devel.sslmit.unibo.it/mailman/listinfo/sigwac | ||
| |ACL SIGWAC mailing list]] | [[http://liste.sslmit.unibo.it/pipermail/sigwac/| archives]] | |ACL SIGWAC mailing list]] | [[http://liste.sslmit.unibo.it/pipermail/sigwac/| archives]] | ||
| Line 37: | Line 41: | ||
| *[[http://www.kwicfinder.com|KWiCFinder]] desktop Web concordancer | *[[http://www.kwicfinder.com|KWiCFinder]] desktop Web concordancer | ||
| - | *[[http://lse.umiacs.umd.edu/|Linguist's Search Engine]] search with parser //(temporarily offline)// | + | *[[http://lse.umiacs.umd.edu/|Linguist's Search Engine]] search with parser //(temporarily? offline)// |
| + | *[[http://sealang.net/webcorpus/|SouthEast Asian Language Web Corpus]] | ||
| *[[http://webascorpus.org/searchwac.html|WebAsCorpus.org Web Concordancer]] (34 languages) | *[[http://webascorpus.org/searchwac.html|WebAsCorpus.org Web Concordancer]] (34 languages) | ||
| *[[http://www.niederlandistik.fu-berlin.de/cgi-bin/web-conc.cgi?sprache=en&art=google|WebCONC]] | *[[http://www.niederlandistik.fu-berlin.de/cgi-bin/web-conc.cgi?sprache=en&art=google|WebCONC]] | ||
| Line 43: | Line 48: | ||
| - | ==== Online Web Corpora ==== | + | |
| + | ==== Web Corpora Online (direct query) ==== | ||
| *[[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]]\\ (English, Chinese, Finnish, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish) | *[[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]]\\ (English, Chinese, Finnish, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish) | ||
| Line 49: | Line 55: | ||
| + | ==== ESL Sites based on Google's Web 1T Corpus ==== | ||
| - | + | *[[140.114.75.12/linggle|Linggle]] wildcard search for collocates and examples based on Google 1T 2-grams | |
| + | |||
| + | *[[http://flax.nzdl.org/greenstone3/flax?a=p&sa=about&c=phrases|FLAX Web Phrases]] Described in\\ Wu, S., Witten, I. H. & Franken, M. (2010). Utilizing lexical data from a web-derived corpus to expand productive collocation knowledge. //ReCALL, 22//(1), 83–102.\\ [[http://flax.nzdl.org/greenstone3/flax?a=p&sa=home&module=|Links to other modules]] | ||
| ==== Other WaC Projects ==== | ==== Other WaC Projects ==== | ||
| Line 59: | Line 67: | ||
| ===== WaC-related Tools / Software ===== | ===== WaC-related Tools / Software ===== | ||
| - | + | *[[http://bootcat.sslmit.unibo.it/|BootCat]] | |
| + | *[[http://crawler.archive.org/|Heritrix Crawler (Web Archive)]] | ||
| + | *[[http://melot.upf.edu/jaguar|Jaguar]] extracts specialized corpora from the web and analyzes various lexical statistics; runs and saves corpora on developer's server | ||
| + | *[[http://grosmoteur.elizia.net/GrosMoteur|GrosMoteur]] Web concordancer; supports either querying Yahoo! or crawling the Web; cross-platform (Python) | ||
| ===== Publications ===== | ===== Publications ===== | ||
| Line 68: | Line 78: | ||
| + | ===== Search Engine Links ===== | ||
| + | |||
| + | *[[http://altsearchengines.com|AltSearchEngines.com]] reviews specialized and non-English SEs | ||
| + | *[[http://www.Multilingual-Search.com/|multilingual-search.com]] discusses issues and developments in non-English search | ||
| + | *[[http://Abondance.com|abondance.com]] tracks the European search market from a French perspective | ||