I get this text from a pdf invoice:
INVOCE DATE Nº ITEM CONTRACT DATA
10/10/15 EN56000004567WWG Standard Plan 3
CONCEPT AMOUNT MONTHS UNITPRIZE PRIZE
CONCEPT AAA 47,101 MB 1,0 3,394074 159,86 Dollars
CONCEPT BBB 26,122 MB 1,0 3,394074 88,66 Dollars
CONCEPT CCC 37,101 MB 1,0 3,394074 125,92 Dollars
TOTAL 374,44 Dollars
This text is actually a table with several lines but only one colunm where data is in fact only separated with a diferent number of whitespaces in almost every line.
What I want is to get the amounts "47,101" , "26,122", "37,101" with a specific regex for each one based on their concept, for example: regex1 gets "47,101" looking for "CONCEPT AAA" and so on.
I have achieved to get "CONCEPT AAA 47,101" using this R line:
regmatches(invoice,regexpr("\\bCONCEPT AAA\\s*([-,0-9]+)", invoice, perl=TRUE))
but I only want the number "47,101".
ADDITIONAL INFO
For read the pdf I use readPDF function from tm package in R which outputs this table which indeed it is a character vector.
Due to there are a lot of invoices with slight differences in disposition I prefer use regex way to get data rather than try a best pdf to table conversion.
BONUS:
Then I will would like to get the prices for each concept "159,86", "88,66", "125,92".