Purifyr
Content extraction solution for semantic information mining

What is Purifyr

Purifyr could remove 95% noise from web pages. Get contents ready for further semantic processing and information retrieval tasks.

Give it a try

Some demos: WSJ | Reuters | Guardian | USA Today | BBC | Bloomberg | ReadWriteWeb | VentureBeat | Mashable | ArtsTechnica | Inc. | ZDNet | CNN | NewYorker

Performance benchmark

- Processing speed: The average time for processing headline links from Google News is about 0.086 sec per cpu core. For a 16-core server, it takes about 0.0065 sec to process a link.
- Precision ratio: The cleaning and retain ratio is 95% for most websites. Cleaning ratio means how much 'noise' on the web page has been removed while retain ratio mens how much 'content' has been kept in the final result.

Check out the API documentation
License Purifyr binary or source code

Copyright © 2008-09 2Zelex Software. All rights reserved. Contact us: sales@purifyr.com