Each month a new pdf file will be added to a specific dir. I am trying to build a data pipe line in R
using targets
to extract some information from these files.
list(
tar_target(
"data_path",
list.files(path = "dir", full.names = T)
)
,
tar_target(
"data_pdf_raw",
read_pdf(data_path),
pattern = map(data_path)
)
,
tar_target(
"data_pdf_clean",
clead_pdf(data_pdf_raw[[1]]),
pattern = map(data_pdf_raw)
)
,
tar_target(
"data_to_sql",
data_to_sql(data_pdf_clean)
)
)
The problem is that targets
skip the data_path
even thou new files are added in the dir. I have tried format = "file"
in data_path
without success. I have also tried to add a new target as mentioned in a post below.
tar_target(paths2, list_path, format = "file", pattern = map(data_path)),
As there are quite many pdfs and the process is time consuming I rather not re-read all files every single time.
I have noticed these two questions but the solutions does not work in my case.