Extract plots from PDFs

Question

We have a pdf page which contains one or more figures which are two-dimensional plots of experimental results. The figures may or may not be embedded in text. Each plot has the x and y axis with their labels and unit measurements marked in the plot. Inside each figure are one or more plots, each with a different color.

How can we convert the plot into a table of corresponding x and y values (say for 100 points) ?

I have already tried WebPlotDigitizer but it works only when the input is a standalone picture of a plot.

What I think I'll have to do is extract the plots from the PDF and process it further. Now, I am not able to find a tool for doing that. I have attached a sample PDF from which the plots have to be extracted.

Note that the 2 plots in the last page of the PDF are images and can be extracted readily(I've found a couple of software for those).The other plots are not images and the software are not able to extract them.

Is there any open source software that can achieve that?

You can convert each page to a PNG or TIFF file with `ImageMagick` and then cut out the plots and send them to WebPlotDigitiser obviously, but I suspect you don't mean that, do you? — Mark Setchell, Feb 27 '16 at 22:23

score 1 · Accepted Answer · edited May 23 '17 at 11:59

1

Plots in this PDF file you have provided are made with vector drawings, so the only way to extract them is to convert PDF into image (i.e. render pages). Try ImageMagick's convert command line, see this answer

edited May 23 '17 at 11:59

Community

1
1

answered Mar 02 '16 at 17:34

Eugene

2,820
19
24

score 0 · Answer 2 · answered Feb 27 '16 at 22:14

As Photoshop is very well scriptable, it is actually possible to extract images from a PDF programmatically (as opposed to pages; see Photoshop JavaScript documentation).

Then you have the whole set of instruments to adjust the images, so that further processing (interpretation) is easier to accomplish.

Extract plots from PDFs

2 Answers2