I'm trying to use pdftools package to extract data table from a pdf. My source file is here: https://hypo.org/app/uploads/sites/2/2021/11/HYPOSTAT-2021_vdef.pdf. Say, I want to extract data from Table 20 on page 170 (Change in Nominal house price)
I use the following code:
install.packages("pdftools")
library(pdftools)
report <- pdftools::pdf_data("https://hypo.org/app/uploads/sites/2/2021/11/HYPOSTAT-2021_vdef.pdf")
tab20 <- as.data.frame(report[170])
To get the proper table I had to manually indicate that I want to extract 170th element of the list (as the table is on page 170). If next year, a new page with table is added to the report, I will have to modify the code to extract 171th element. Is there a way to do it in a more automated manner?
Basically, what I need to do is to find the element of the list that contains string "Change in Nominal house price". Any suggestion how to do it?