0

I am doing pdfinfo in my system function and I want to only get the page size 612 x 1008 but I am not sure how to parse it out.

my code

output = system("pdfinfo example.docx_ms.pdf")
print "#{output} \n"
x = "612x1008"
puts x

if x == output
 puts "yes"
else
 puts "no"
end

output

612 x 1008
true
tintin
  • 3
  • 2
  • 1
    I'd avoid the external dependency and use a Ruby PDF library like [pdf-reader](https://github.com/yob/pdf-reader) instead. – Stefan May 07 '20 at 13:23

2 Answers2

1

Using answer provided here, You can do this

output = `pdfinfo example.docx_ms.pdf | grep 'Page size:' | awk '{ print $3 $4 $5} '`.chomp
print "#{output} \n"

What this does is

  1. Return document details as provided by pdfinfo
  2. String match the line wanted using grep ie Page size
  3. Using awk to filter the page size dimensions.

You can read more about grep and awk

enter image description here

Marvin Kang'
  • 198
  • 1
  • 6
  • awesome thank you! my output is `612x1008 true` I have an if-statement i've added to compare the outputs and it is coming back as `false` I've updated my code in the question – tintin May 07 '20 at 14:04
  • @tintin the if statement will not work unless you extract `Page size` dimensions from pdfinfo result. right now you are comparing the X variable with pdfinfo result. – Marvin Kang' May 07 '20 at 14:13
  • how would I be able to do that? Aren't we already getting the dimensions? – tintin May 07 '20 at 14:16
  • 2
    This prints the output, it doesn't capture it. You'll need [Open3](https://docs.ruby-lang.org/en/2.7.0/Open3.html) for that. It's also kind of backwards to use `grep` and `awk` when Ruby can do all of that and more internally. If the output needs parsing `readlines.grep` is a good start. – tadman May 07 '20 at 14:17
0

Use backticks or %x instead of system to capture the output:

output = `pdfinfo example.docx_ms.pdf | grep 'Page size:'`
puts output.gsub(/Page size:\s*\b/, '').chomp

irb(main):001:0> `pdfinfo example.docx_ms.pdf | grep 'Page size:'`.gsub(/Page size:\s*\b/, '').chomp
=> "595 x 842 pts (A4)"
max
  • 96,212
  • 14
  • 104
  • 165
  • I am getting a `undefined method 'gsub' for true:TrueClass (NoMethodError)` – tintin May 07 '20 at 14:09
  • 1
    Are you using backticks or system? I initially made a copy paste error and used system (which returns true/false). – max May 07 '20 at 14:15
  • awesome this worked, But could you explain what `Page size:\s*\b/` does? – tintin May 07 '20 at 17:52
  • gsub replaces the pattern with the second argument. `\s*` matches 0 or more spaces. `\b` matches a word boundry and I guess its actually kind of redundant when I think about it. – max May 07 '20 at 18:06