Differences

This shows you the differences between two versions of the page.

wiki:resources [2009/12/22 16:43]
whf
wiki:resources [2010/06/28 16:27] (current)
whf
Line 6: Line 6:
  ; [[http://www.80legs.com|80legs]] : Webcrawler and text processor distributed over 50,000 PCs available during idle moments (@SETI model).  Some built-in text processing capability (e.g. strip HTML, match regular expressions to return only matching pages or text) with support for Java and .NET custom code.  Fee based, but very low cost: $2.00 per million pages crawled and $0.03 per CPU hour.  Claims to crawl 2 billion pages in a day.  Still in beta. (Nov 09)   ; [[http://www.80legs.com|80legs]] : Webcrawler and text processor distributed over 50,000 PCs available during idle moments (@SETI model).  Some built-in text processing capability (e.g. strip HTML, match regular expressions to return only matching pages or text) with support for Java and .NET custom code.  Fee based, but very low cost: $2.00 per million pages crawled and $0.03 per CPU hour.  Claims to crawl 2 billion pages in a day.  Still in beta. (Nov 09)
 +  
 +
 +  ; [[http://www.alchemyapi.com/|AlchemyAPI]] : offers a number of useful services (copied from their webpage): named entity extraction, text categorization (very basic), language detection (claims about 90 languages recognized), keyword / term extraction, web page cleaning (= boilerplate removal; works fine for European languages, less consistent results with e.g. Chinese), structured data / content scraping.  Straightforward API with examples in various programming languages.  Your program sends a URL to REST endpoint of one of their services, it returns what you ask for. Alternatively you can post the data directly.  While weak for non-European languages, full support for Russian is a pleasant surprise.\\
 +//"Use the full range of AlchemyAPI services completely free of cost! This includes both commercial and non-commercial use! Make up to 30,000 API calls a day. Higher limits available to approved educational institutions and non-profit groups."//
 +
 +
 +  ; [[http://ontology.csse.uwa.edu.au/research/api.pl|University of Western Australia]] : announced 23 June 2010 by Wilson Wong on the Corpora List \\
 +//"We have made available a list of web services for accessing text mining and NLP tools implemented at our research group (http://ontology.csse.uwa.edu.au) such as boilerplate removal (known as HERCULES), semantic similarity/relatedness measures (i.e. Normalised Web Distance, n-Degree of Wikipedia), noun phrase chunking, triple extraction, noisy text cleaning (known as ISSAC), simple term extraction, and access to our multi-domain, 300 million token text corpora (which are continuously growing). Please write to wilson@csse.uwa.edu.au to obtain a free developer key."//
 +
 +
 +

Personal Tools