Frequently when dealing with OCR you have a PDF, and each page is a raw image of the scanned in text. One of the first things you need to do is convert that PDF into a sequence of images.
You can use ImageMagic to extract certain pages. On ubuntu, click here to install ImageMagick
This extracts then second page and puts it in a PNG image called page2.png. The first page is number 0, second page is 1, etc.
convert document.pdf page2.png
You may find the resultant image too low quality and fuzzy. If you want a clearer, higher quality image (but large file size), use the
-density argument like so:
convert -density 500 document.pdf page2.png