Differences
This shows you the differences between two versions of the page.
|
wiki:resources [2009/12/22 16:43] whf |
wiki:resources [2010/06/28 16:27] (current) whf |
||
|---|---|---|---|
| Line 6: | Line 6: | ||
| ; [[http://www.80legs.com|80legs]] : Webcrawler and text processor distributed over 50,000 PCs available during idle moments (@SETI model). Some built-in text processing capability (e.g. strip HTML, match regular expressions to return only matching pages or text) with support for Java and .NET custom code. Fee based, but very low cost: $2.00 per million pages crawled and $0.03 per CPU hour. Claims to crawl 2 billion pages in a day. Still in beta. (Nov 09) | ; [[http://www.80legs.com|80legs]] : Webcrawler and text processor distributed over 50,000 PCs available during idle moments (@SETI model). Some built-in text processing capability (e.g. strip HTML, match regular expressions to return only matching pages or text) with support for Java and .NET custom code. Fee based, but very low cost: $2.00 per million pages crawled and $0.03 per CPU hour. Claims to crawl 2 billion pages in a day. Still in beta. (Nov 09) | ||
| + | |||
| + | |||
| + | ; [[http://www.alchemyapi.com/|AlchemyAPI]] : offers a number of useful services (copied from their webpage): named entity extraction, text categorization (very basic), language detection (claims about 90 languages recognized), keyword / term extraction, web page cleaning (= boilerplate removal; works fine for European languages, less consistent results with e.g. Chinese), structured data / content scraping. Straightforward API with examples in various programming languages. Your program sends a URL to REST endpoint of one of their services, it returns what you ask for. Alternatively you can post the data directly. While weak for non-European languages, full support for Russian is a pleasant surprise.\\ | ||
| + | //"Use the full range of AlchemyAPI services completely free of cost! This includes both commercial and non-commercial use! Make up to 30,000 API calls a day. Higher limits available to approved educational institutions and non-profit groups."// | ||
| + | |||
| + | |||
| + | ; [[http://ontology.csse.uwa.edu.au/research/api.pl|University of Western Australia]] : announced 23 June 2010 by Wilson Wong on the Corpora List \\ | ||
| + | //"We have made available a list of web services for accessing text mining and NLP tools implemented at our research group (http://ontology.csse.uwa.edu.au) such as boilerplate removal (known as HERCULES), semantic similarity/relatedness measures (i.e. Normalised Web Distance, n-Degree of Wikipedia), noun phrase chunking, triple extraction, noisy text cleaning (known as ISSAC), simple term extraction, and access to our multi-domain, 300 million token text corpora (which are continuously growing). Please write to wilson@csse.uwa.edu.au to obtain a free developer key."// | ||
| + | |||
| + | |||
| + | |||