Purifyr
Content extraction solution for semantic information mining

What is Purifyr

Purifyr could remove 95% noise from web pages. Get contents ready for further semantic processing and information retrieval tasks.

Give it a try

Some demos: WSJ | Reuters | Guardian | USA Today | BBC | Bloomberg | ReadWriteWeb | VentureBeat | Mashable | ArtsTechnica | Inc. | ZDNet | CNN | NewYorker

Enter the url address of the web page you'd like to purifyr:




Performance benchmark

- Processing speed: The average time for processing headline links from Google News is about 0.086 sec per cpu core. For a 16-core server, it takes about 0.0065 sec to process a link.
- Precision ratio: The cleaning and retain ratio is 95% for most websites. Cleaning ratio means how much 'noise' on the web page has been removed while retain ratio mens how much 'content' has been kept in the final result.

Purifyr Bookmarklet for your browser

Drag Purifyr! to your browser's Bookmarks Toolbar. Once this is on your toolbar, you'll be able to Purifyr any webpage with only one click.
This is compatible with most web browsers and platforms as long as your bookmarks or favorites allow javascript. The links toolbar may not be visible in all setups and in most browsers, you can enable it in the View->Toolbars menu of your web browser. You can also put it in your bookmarks instead of the links toolbar.

Check out the API documentation
License Purifyr binary or source code

Copyright © 2008-09 2Zelex Software. All rights reserved. Contact us: sales@purifyr.com