Using Google for OCR

Amit Agarwal has posted a tip on his blog about using Google to convert PDF to text.  For some reason, he suggest putting all your PDFs documents on the web:

Create a folder in your website (say and upload all the PDF images to that folder. Now create a public web page that links to all the PDF files. Wait for the Google bots to spider your stuff.

Once done, type the query “ filetype:pdf” to see the PDF documents as HTML.

Why would you want your documents to be accessible by anyone? Why wait for Google to index your page?

There’s a much easier way I’ve been using, and one of the commentators on Agawal’s blog points it out:

You can upload the Scanned PDFs to Gmail and sent it you only. Then Open your Inbox and the mail sent from you, you have an option to View as HTML. That will solve the Hosting problem.

Join the Conversation

1 Comment

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: