Loop in R to get specific columns and the sum of the rest from different .csv files

Question

So I have about 300 large .csv files. I need to get column 1 and 2, and then sum columns from 3 to 20 for each file. What's the fastest way to do so? Thanks

Not clear based on the description. You said you want to get column 1 and 2 and sum columns rest? Please show a small reproducible example and expected output\ — akrun, Jul 10 '18 at 22:35
Please read [how to ask good questions](https://stackoverflow.com/help/how-to-ask). Other good references include [reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and [minimal examples](https://stackoverflow.com/help/mcve). Furthermore, each of these has many many questions already answered on SO: (1) read many large CSV files into a list; (2) sum specific columns. — r2evans, Jul 10 '18 at 22:36
Unless it's a really extreme situation `sum` or `colSums` should work fine for the first part. To read in many files I would suggest to put them in the same folder or folders within the same folder and then use `dir` or `System` to get their names, etc, etc — Hack-R, Jul 10 '18 at 22:40
Try this: `lst<-lapply(filenames,read.csv)` then `lst<-lapply(lst, function(x) cbind(x[1:2],s=rowSums(x[3:20])))`, does that do it? — r2evans, Jul 10 '18 at 22:59
Yes sorry, I wasn't clear enough. Previously I was doing this: `d001 <- read.csv(file="d001.csv", header=TRUE, sep=",") transmute(d003,i, j , Flux= h01 + h02 + h03 + h04 + h05 + h06 + h07 + h08 + h09 + h10 + h11 + h12 + h13 + h14 + h15 + h16 + h11 + h22 + h23 + h24)` So I Can get a new dataset with columns i, j and Flux, and Flux is the sum of the rest of the columns. I need to do this for all 300 files, and then join them, so I was trying to find a loop but no success yet to do so; files are large (400 mb) — Ricardo Alonso Esparza Gamez, Jul 11 '18 at 14:29

Rachit Kinger · Accepted Answer · 2018-07-12T09:05:02.250

First put all files in a single folder.

filenames <- list.files(pattern = ".csv")

all <- lapply(filenames, function(name) {
  readr:: read_csv(name)
})

Based on your description the list should have a length of 300, each item containing a dataframe. If you want to bind the rows of all dataframes you can use dplyr's bind_rows(). Below is a solution based on an example dataset with 4 cols per dataset.

# assumed dataset post importing into a list from step 1
listOfDataFrames <- vector(mode = "list", length = 4)

for (i in 1:4) {
  listOfDataFrames[[i]] <- data.frame(Col1=sample(letters, 5, rep=F),
                                      Col2=rnorm(5), Col3=rnorm(5))
}


all_df <- dplyr::bind_rows(listOfDataFrames)
required_df <- all_df[,1:2]
sum_of_cols <- apply(all_df[,3:4], 2, sum)

===== edited answer based on new information =====

If all your csv files d001 to d300 are in the same folder you can apply the same principles that I have shared above (I'm not sure about the efficiency of the code, but it should still work).
Step 1: load all data frames into a single list
Step 2: loop through the list performing the same transformation on all dataframes (each element of list)
Step 3: loop through list writing each element of list into a separate csv file

Here is an example code:

filenames <- list.files(pattern = ".csv")

all <- lapply(filenames, function(name) {
  readr:: read_csv(name)
})


all_transformed <- lapply(all, function(df) {
  transmute(df, i, j, sum = sum(h01:h24, na.rm = TRUE))
})

names(all_transformed) <- paste0("d",1:length(all_transformed))
lapply(1:length(all_transformed), function(i) write.csv(all_transformed[[i]], 
                                                file = paste0(names(all_transformed[i]), ".csv"),
                                                row.names = FALSE))

Thanks Rachit, So what I need is this: `d003 <- read.csv(file="d003.csv", header=TRUE, sep=",") transmute(d003,i, j , Sum= h01 + h02 + h03 + h04 + h05 + h06 + h07 + h08 + h09 + h10 + h11 + h12 + h13 + h14 + h15 + h16 + h11 + h22 + h23 + h24)` Then, I need to do for all those files (names go from d001 to d300) and put it in one new csv file — Ricardo Alonso Esparza Gamez, Jul 11 '18 at 14:00
I've edited my answer based on this information. Hope it helps. — Rachit Kinger, Jul 12 '18 at 09:24
Thank you! Extremely helpful, exactly what I was looking for! Still, I had to sum manually the h because it gave me this error: `In h01:h24 : numerical expression has 941017 elements: only the first used` Thanks a lot! — Ricardo Alonso Esparza Gamez, Jul 12 '18 at 19:57

Loop in R to get specific columns and the sum of the rest from different .csv files

1 Answers1