I'm using a targets
workflow pipeline. Part of this pipeline is to monitor a directory of csv files for updates. There are more than 10,000 csv files in this directory, and new files are added weekly. I want to be able to identify the newly added files and append them to an existing set of *.rds
files. The easy thing would be to re-run the process that creates the 5 subsets of *.rds
files each week, but that takes time. The efficient thing would be to identify the newly added files, and simply bind_rows
with the proper rds
file.
I can do this easily enough with typical programming using dir()
and setdiff()
, where I store a snapshot of csv filepaths from the previous day. But I'm struggling to accomplish this within the targets
framework.
Here is an attempt that doesn't seem to work. I think I want to monitor the temporary results in the /_targets
directory, but I'm not sure how to go about doing that. And, the targets
documentation recommended not using tar_load
inside the target configuration itself.
tar_script({
list(
tar_target(csv_directory, "/csv/"),
tar_target(csv_snapshot, dir(csv_directory)),
tar_target(append_action, if(length(setdiff(dir(csv_directory), dir(csv_snapshot))) > 0){
...}
})