Service API Documentation
Service highlights
- Simple interface: A single API call lets you get started processing content from Purifyr. And since there's no need to negotiate with a sales person, anyone can get started in just minutes.
- Fast response: Using our propriatary algorithm, Purifyr extracts content from any web pages at lightening speeds.
- Adaptive algorithm: Our algorithm is carefully designed to be self adaptive to web pages. We're not using any hard coded parameters. Purifyr gives you flexibile, reliable result for any web pages you feed it.
- 100% unicode support: The Purifyr engine has built-in automatically encoding detection algorithm and unicode support from the very beginning.
- HTML tolerant: We understand that most web pages have small "defects". Our engineering team has tested over 5,000 pages to make sure our engine could handle 'real-world' pages.
Parameters
Send a POST request to http://purifyr.com/api/, with following parameters:
- url: the url of the web page you'd like to process.
- key: the public key you'll receive when you sign up with the service.
- hash: HMAC-SHA256 hexdigest hash of the url using your private key.
- html: (optional) If provided, purifyr will not fetch the content from the specified url. This is useful when you need to process private contents. Please make sure url is always present since it's required by the authentication process.
Error messages
The extracted content will be returned if there is no problems. Otherwise, you'll receive one of the following error codes:
- missing url: you forgot to provide the url of the web page you'd like to process.
- missing key: you need to provide the application key.
- invalid key: the key you provided couldn't be found in our database.
- missing hash: you forgot to provide the key or the key you supply couldn't be found in our database.
- authentication failed: the hash digest doesn't pass our authentication test. Please make sure you've used HMAC-SHA256 correctly.
- unable to access url: our crawler is having problem accessing the url you've specified. The main causes is the url requires some authentications. Try to put the web content into the html parameter.
- trial limit exceeded: you have used up all the 100 trial function call. Please click the PayNow button in account page to start the paying process. If you need to give it more tests, please don't hesitate to contact us.
- unable to parse content: error happens when our engine tries to process your request. Mostly likely, this is caused by passing a url to binary contents like PDF or images. If you believe it's our fault, please contact us.
How to test
Use the curl tool, you could quickly test against the service api to detect any potential problems:
curl -d "
url=http://www.wired.com/thoughts-on-the-new-kindle.html&
key=7cccc305e39f4693b6df4aabe5e1620e&
hash=3332f4e38ae850b90b99a8621ebc9a1125a1c0a86ab471550111b"
http://www.purifyr.com/api/
