What is Purifyr
Purifyr could remove 95% noise from web pages. Get contents ready for further semantic processing and information retrieval tasks.
Give it a try
Some demos: WSJ | Reuters | Guardian | USA Today | BBC | Bloomberg | ReadWriteWeb | VentureBeat | Mashable | ArtsTechnica | Inc. | ZDNet | CNN | NewYorker
Enter the url address of the web page you'd like to purifyr:
Performance benchmark
- Processing speed: The average time for processing headline links from Google News is about 0.086 sec per cpu core. For a 16-core server, it takes about 0.0065 sec to process a link.
- Precision ratio: The cleaning and retain ratio is 95% for most websites. Cleaning ratio means how much 'noise' on the web page has been removed while retain ratio mens how much 'content' has been kept in the final result.
Purifyr Bookmarklet for your browser
Drag Purifyr! to your browser's Bookmarks Toolbar. Once this is on your toolbar, you'll be able to Purifyr any webpage with only one click.
This is compatible with most web browsers and platforms as long as your bookmarks or favorites allow javascript. The links toolbar may not be visible in all setups and in most browsers, you can enable it in the View->Toolbars menu of your web browser. You can also put it in your bookmarks instead of the links toolbar.
