How to extract one page of a PDF as an image


Frequently when dealing with OCR you have a PDF, and each page is a raw image of the scanned in text. One of the first things you need to do is convert that PDF into a sequence of images.


You can use ImageMagic to extract certain pages. On ubuntu, click here to install ImageMagick

This extracts then second page and puts it in a PNG image called page2.png. The first page is number 0, second page is 1, etc.

convert document.pdf[1] page2.png

You may find the resultant image too low quality and fuzzy. If you want a clearer, higher quality image (but large file size), use the -density argument like so:

convert -density 500 document.pdf[1] page2.png

This entry is tagged: