Convert PDF to image in java

Problem

You have a PDF file and you wish to convert each of its pages to an image. Going one by one, taking screenshot and using an image editor will definitely take time. In this post we will see how to automate this task.

solution

PDF – Portable Document Format is one of the most popular file formats out there. That means all the problems related to this file format are possibly addressed already. In this case we will use a library specifically designed for manipulating PDF files. It comes from Apache Software Foundation, the PDFBox.

In this case, Maven is used to add the dependency to the program as explained in their getting started. The .jar file can be also downloaded and loaded in your IDE as external library.

Algorithm
  1. Open a given PDF document
  2. Iterate through the page(s)
  3. For each one, create an image
  4. Close document
package com.programmerabroad;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

class PdfToJpeg {

    public static void main(String[] args) throws IOException {

        String pdfPath = "ebrochure.pdf";
        PDDocument document = PDDocument.load (new File(pdfPath));
        PDFRenderer renderer = new PDFRenderer(document);
        int pages = document.getNumberOfPages();

        System.out.println("Converting...");

        for(int i=1; i<=pages; i++) {
            System.out.println("\tPage #" + i  + " of " + pages);
            BufferedImage bufferedImage = renderer.renderImageWithDPI(i-1, 300);
            ImageIO.write(bufferedImage, "JPEG", new File("pdf-page-" + i + ".jpeg"));
        }

        document.close();

        System.out.println("Done");
    }
}

As you can see it is very simple to do this task thanks to the great PDFBox library. Of course, the logic would have been better if extracted into a method instead of placing it all in the main method.

Conclusion

In this post we saw how simple it is to convert a PDF to JPEG file(s) using the PDFBox. It is a powerful tool capable of doing other things like extracting text from a PDF, splitting, merging and other. You can check out their documentation and FAQ.

See more coding tutorials.


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x