0

Our office does scanning of data entry forms, and we lack any proprietary software that is able to do automated double-entry (primary entry is done by hand, of course). We are hoping to provide a tool for researchers to highlight regions on forms and use scanned versions to determine what participant entry was.

To do this, all I need for a very rough attempt is a file to read in PDFs as raster files, with coordinates as X, Y components, and B&W white "intensities" as a Z-axis.

We use R mainly for statistical analysis and data management, so options in R would be great.

maRtin
  • 6,336
  • 11
  • 43
  • 66
AdamO
  • 4,283
  • 1
  • 27
  • 39

1 Answers1

5

You could use the raster package from R. However, it doesnt support .pdf files, but .tif,.jpg,.png (among many others). But coverting your pdfs into pngs shouldn't be a big problem: Look here for more information.

Once you have your png files ready, you can do the following:

png <- raster("your/png/file.png")

and then use the extract() function to get your brigthness value from the picture. I.e. let's say your png is 200x200px and you want to extract a pixel value from row 100 and column 150:

value <- extract(png, c(150,100))
Community
  • 1
  • 1
maRtin
  • 6,336
  • 11
  • 43
  • 66
  • This answer seems reasonable to me - not sure why it got down voted. The important point that the OP needs to realise is that pdf is NOT a raster format. The conversion can be done either within R e.g. using im.convert(), or as maRtin suggests using an external application. Both have merits. – dww Mar 25 '16 at 00:04
  • @dww That is exactly the connection I was failing to make. These particular PDFs are from scanned documents, and I'm not sure how they're encoded, but it's relatively inefficient as the files are large. Therefore, I'm confident that this should work fairly well. – AdamO Mar 25 '16 at 16:26
  • It would be great to have a discussion about `tools::compactPDF` here because the topic is PDF and its size reduction. I feel your answer stub now. – Léo Léopold Hertz 준영 Nov 15 '16 at 20:49