I want to extract information from pdfs.
The following is an extract from a policy, where the pdf is converted to a txt document using https://github.com/yob/pdf-reader/.
Vehicle Description 2007, PORSCHE, CAYMAN 3.2
Registration Number USD-2394 Vin Number FSDFKJL23123KFAS
MY COVER DETAILS
Cover USD37.45
I would like to extract e.g. the Vehicle description and cost of cover:
vehicle.description => "2007, PORSCHE, CAYMAN 3.2"
vehicle.registration => "USD-2394"
vehicle.cost_of_cover => "37.45"
Can anyone please advise on the appropriate method. The problem is that the layout of the policy might change but the data will mostly be the same, just with different values.
If regex is the way to go can anyone just provide example code.