0

I have several files in one place. I merge them like this:

punkty_pgd <- read_xlsx("C:/Users/Desktop/PC- 2021042500.xlsx", sheet = "NEW")
punkty_pgd_2 <- read_xlsx("C:/Users/Desktop/PC- 2021042512.xlsx", sheet = "NEW")
punkty_pgd_3 <- read_xlsx("C:/Users/Desktop/PC- 2021042600.xlsx", sheet = "NEW")
punkty_pgd <- rbind(punkty_pgd, punkty_pgd_2, punkty_pgd_3)

There are dates in their names and I would like to limit this merging somehow, that is, only merge files that have a date range in the name: from date_start to date_end

r2evans
  • 141,215
  • 6
  • 77
  • 149
Robert
  • 29
  • 5
  • FYI, "merge" is not the right term for what you are doing with `rbind`: you are row-binding, row-concatenating, or just combining the frames. A "merge" operation is a "join", where you match one or more columns between frames, resulting in zero or more rows paired between them ... see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/a/6188334/3358272. – r2evans May 04 '21 at 12:23

1 Answers1

1

For the "load many files and process them", a good place to start is https://stackoverflow.com/a/24376207/3358227.

Lacking that, let's address the filename/date thing.

First, assuming there are files you don't want to load (this time), then we should first get the names of the files.

files <- list.files("c:/Users/Desktop", pattern = "\\.xlsx$", full.names = TRUE)
### add a fake file without a date, for testing
files <- c(files, "Something.xlsx")

files
# [1] "C:/Users/Desktop/PC- 2021042500.xlsx"
# [2] "C:/Users/Desktop/PC- 2021042512.xlsx"
# [3] "C:/Users/Desktop/PC- 2021042600.xlsx"
# [4] "Something.xlsx"                      

Now we can determine what the Date is by parsing the filenames.

files <- cbind(strcapture("\\b([0-9]{10})\\b", files, list(fn = "")), origfn = files)
files
#           fn                               origfn
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx
# 4       <NA>                       Something.xlsx

(strcapture returns a data.frame, and I'll keep it in there for convenience.)

files$date <- as.Date(files$fn, format = "%Y%m%d%H")
files
#           fn                               origfn       date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx 2021-04-26
# 4       <NA>                       Something.xlsx       <NA>

Now we can filter just the files we need for a specific date range.

  • base R:

    dates <- as.Date(c("2021-04-20", "2021-04-25"))
    loadthese <- subset(files, dates[1] <= date & date <= dates[2])
    loadthese
    #           fn                               origfn       date
    # 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
    # 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
    list_of_frames <- lapply(loadthese$origfn, readxl::read_xlsx, sheet = "NEW")
    punkty_pgd <- do.call(rbind, list_of_frames)
    
  • tidyverse

    dates <- as.Date(c("2021-04-20", "2021-04-25"))
    punkty_pgd <- files %>%
      filter(between(date, dates[1], dates[2])) %>%
      pull(origfn) %>%
      map_dfr(~ read_xlsx(., sheet = "NEW"))
    
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you, the solution works. I still have a question, is it possible to somehow limit the date range not to exceed seven days? The dates will be variables. – Robert May 04 '21 at 19:52
  • I'm not certain I understand. The current contrived code in this answer assumes six days, so if you change the value of my `dates` variable, you have full control over the span of the limits. Am I misunderstanding your question? – r2evans May 04 '21 at 20:04
  • 1
    Sorry, I misunderstood the code. Everything is fine, it works as I wanted. Thank you for your help. :) – Robert May 04 '21 at 20:25