I'll demonstrate with fake filenames, but the real commands I suggest you run are commented out with the same structure. I'm assuming .xlsx
for "excel files", but this works equally well with .csv
(just update the pattern).
# files <- list.files(path = ".", pattern = "User_Info_.*\\.xlsx$", full.names = TRUE)
files <- c("./User_Info_Jan.xlsx", "./User_Info_Feb.xlsx", "./User_Info_Mar.xlsx")
monthnames <- strcapture("User_Info_(.*)\\.xlsx", files, list(month = ""))
monthnames
# month
# 1 Jan
# 2 Feb
# 3 Mar
At this point, we've extracted the month name from each filename. I find strcapture
(in base R) better than gsub
, as the latter returns the entire string if there are no matches; another alternative in base R is regmatches(files, gregexpr(...))
, but that seems a bit more complicated than it needs to be here. Another alternative is stringr::str_extract
which might be more intuitive if you're already using stringr
and/or other tidyverse packages.
From here, we can iterate over the files to read them in.
# out <- Map(function(mn, fn) transform(readxl::read_excel(fn), month = mn), monthnames$month, files)
set.seed(42)
out <- Map(function(mn, fn) transform(mtcars[sample(32,size=2),], month = mn), monthnames$month, files)
out
# $Jan
# mpg cyl disp hp drat wt qsec vs am gear carb month
# Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 Jan
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Jan
# $Feb
# mpg cyl disp hp drat wt qsec vs am gear carb month
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Feb
# Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2 Feb
# $Mar
# mpg cyl disp hp drat wt qsec vs am gear carb month
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Mar
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Mar
Combining that list-of-frames into a single frame is direct:
do.call(rbind, out)
# mpg cyl disp hp drat wt qsec vs am gear carb month
# Jan.Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Jan
# Jan.Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Jan
# Feb.Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Feb
# Feb.Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Feb
# Mar.Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Mar
# Mar.Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Mar
An alternative to all of that can use data.table::rbindlist
or dplyr::bind_rows
, and assign the "id" column directly:
# out <- Map(function(mn, fn) readxl::read_excel(fn), monthnames$month, files)
set.seed(42)
out <- Map(function(mn, fn) mtcars[sample(32,size=2),], monthnames$month, files)
data.table::rbindlist(out, idcol = "month")
# month mpg cyl disp hp drat wt qsec vs am gear carb
# <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: Jan 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 2: Jan 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 3: Feb 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 4: Feb 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 5: Mar 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# 6: Mar 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
dplyr::bind_rows(out, .id = "month")
# month mpg cyl disp hp drat wt qsec vs am gear carb
# Chrysler Imperial Jan 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# Hornet Sportabout Jan 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Mazda RX4 Feb 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Pontiac Firebird Feb 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# Merc 280 Mar 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# Hornet 4 Drive Mar 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
The latter two work because when I called Map
earlier, the first argument (monthnames$month
) passed to the inner function is used as the names for the list
output, which is why you see $Jan
etc as the elements of the returned list. Both rbindlist
and bind_rows
use those names as "id" columns when idcol=
/.id=
are used. (If no "names" are actually present, both functions count along them.)