automation to merge data frames adding a line to keep note of the origin

Question

I am a newbie with R. I have 6 different data frames (U, V, W, X, Y, Z), coming from different CSV files, each of them has the same columns (Surname, Name, Winter, Spring, Summer), and I would like to create a new data frame containing the 5 rows and a sixth row which indicates one of the letters (U, V, ...) where the original data comes from. I have tried with the following code:

U <- read.csv(file = "U", header = T)
V <- read.csv(file = "V", header = T)
W <- read.csv(file = "W", header = T)
X <- read.csv(file = "X", header = T)
Y <- read.csv(file = "Y", header = T)
Z <- read.csv(file = "Z", header = T)

U['class'] <- rep("U")
V['class'] <- rep("V")
W['class'] <- rep("W")
X['class'] <- rep("X")
Y['class'] <- rep("Y")
Z['class'] <- rep("Z")

students <- rbind(U, V, W, X, Y, Z)

I would really need to use a loop, so that I can in future go from A to Z. I would like to do something like this, which is totally nonsense.

for(class.name in list(U, V, W, X, Y, Z)){
  class.name['class'] <- rep('class')
}

Is there a reasonable way to do it?

Thank you

Edited

To clarify my question, the idea is that I have 6 different stations collecting raw data and giving me 6 different data frames. I want to merge them together, maintaining the information of from which station the raw data comes from.

Possible incomplete solution Following @MrFlick's advice, I have managed to put everything in one list as follows

classes <- c('U', 'V', 'W', 'X', 'W', 'Z')
my.files <- paste(classes,".csv",sep="")
year.eight <- lapply(my.files, read.csv, header = T)
name(year.eight) <- classes

However, the final outcome should be one single data frame with a further column to indicate which class are the students in. Can someone help me with this, please?

Try to expand your question with example of expected output. — jyr, Feb 13 '20 at 16:02
Well, the thing that makes this trick is that you have 6 separate data.frames in your global environment. It would be much easier if they were all in a list. Did you really create them by copy/pasting `read.csv` a bunch of times? You can save yourself some work by using `lapply` to read all the files into a list and then you can map a transformation over that list. In particular, check out this answer: https://stackoverflow.com/a/24376207/2372064 — MrFlick, Feb 13 '20 at 16:14
Thank you @MrFlick, the post is absolutely interesting and I have followed it, changing radically the program. I have managed to put all my files in one single list, and to name the list properly. Now, though, I still have the problem of adding the column. Should I edit the question to clarify? — Logos, Feb 13 '20 at 18:42

Tomas Capretto · Answer 1 · 2020-02-13T17:18:40.473

2

Let me try to share an example

Suppose we have 3 files A.csv, B.csv and C.csv in a folder called "data" within our working directory. Suppose they contain a single column with a numeric value. Then this code does what you want.

library(readr)

files <- paste0("data/", list.files("data"))
df_list <- list()

for (i in seq_along(files)) {
  tmp <- read_csv(files[[i]])
  tmp["class"] <- sub("\\..*", "", basename(files[[i]])) # ".csv$" also works in this case
  df_list[[i]] <- tmp
}

output <- dplyr::bind_rows(df_list)
output
##  A tibble: 3 x 2
#       x class
#   <dbl> <chr>
# 1     1 A    
# 2     1 B    
# 3     1 C

Edited following Tensibai's excellent suggestion.

edited Feb 13 '20 at 17:18

answered Feb 13 '20 at 16:14

Tomas Capretto

721
5
6

1

I feel like something like `tmp <- read_csv(files[[i]]); tmp['class'] <- basename(files[[i]]); df_list[[i]] <- tmp` would be very more straightforward and get the same result – Tensibai Feb 13 '20 at 16:31
You are right! It is so confusing in my example because I adapted an old code I used to import several data frames that I would use later in the code. But what you suggest is a better option – Tomas Capretto Feb 13 '20 at 16:44
I agree: the automation comes really at high costs: I can barely understand how to do it. I hoped it was easier, because the problem is not really so uncommon: think of having raw data of the same populations coming from different stations and you want to put them together, but in merging the frames into one, I want to keep note of where they come from. I thought it might have been easier. – Logos Feb 13 '20 at 17:18

score 1 · Accepted Answer · answered Feb 13 '20 at 19:53

To do this more easily with a list of data.frames, it might look something like this

classes <- c('U', 'V', 'W', 'X', 'W', 'Z')
my.files <- paste(classes,".csv",sep="")
year.eight <- mapply(function(path, code) {
    data <- read.csv(path, header = T)
    data$class <- code
    data
}, my.files, classes)
combined <- do.call("rbind", year.eight)

Or using dplyr

classes <- c('U', 'V', 'W', 'X', 'W', 'Z')
my.files <- paste(classes,".csv",sep="")
year.eight <- lapply(my.files, read.csv, header = T)
names(year.eight) <- classes
combined <- dplyr::bind_rows(year.eight, .id="class")

This is an amazing solution, but the year.eight comes up a bit different than what I expected. I have tried to maintain my original list of frames, creating a function on your model, like `lapply(classes, function(x) {year.eight[[x]]['Class'] <- x})` but the result is doing nothing. Could you explain me what I did wrong here? — Logos, Feb 13 '20 at 20:54

hammoire · Answer 3 · 2020-02-14T19:21:29.960

If you save all the files of interest in a specific directory you can then access them using list.files(). Then loop over this using map_df from purrr package. Think this does the trick

#Load package
library(purrr) 

#Define the directory where files are saved
path <- "your_file_path/" #e.g. my Mac desktop "~/Desktop/"

#Create vector of file names
files <- list.files(path)

#Use map_df function from purrr to loop over and return a data frame with extra label variable
map_df(files, function(x){
  #save as df
  df <- read.csv(paste0(path, "/",x))
  #use gsub to remove ".csv" from file name
  df['class'] <- gsub("\\.csv", "", x)
  df
})

automation to merge data frames adding a line to keep note of the origin

3 Answers3