9

R 3.0.3: I have 40 csv files all structured the same that I want to rbind into one file so I can calculate the mean of one column.

I searched:

  • this website
  • R in a Nutshell
  • R_Intro sources
  • ?rbind Help in RStudio

I cannot find the answer.

Any suggestions/pointers?

KobeJohn
  • 7,390
  • 6
  • 41
  • 62
user3551565
  • 91
  • 1
  • 1
  • 2

2 Answers2

19

Using the answer from here [Importing several files and indexing them ]

list files with .csv extension - this assumes that the only .csv files in your working directory are the ones you want to read

files  <- list.files(pattern = '\\.csv')

read files into a list - are there headers?

tables <- lapply(files, read.csv, header = TRUE)

rbind files

combined.df <- do.call(rbind , tables)

You can then find the mean - find which columns are numeric

s <- sapply(combined.df, is.numeric)

find the mean of numeric variables

colMeans(combined.df[s])
Community
  • 1
  • 1
user20650
  • 24,654
  • 5
  • 56
  • 91
  • .@user20650 - Thank you for your answer. If before `rbind()`, I want to add a unique identifier (numeric like 0,1,2 etc) that can help me distinguish which data is from file 1 and which is from file , then how can I achieve this? I am currently trying `loop()`, but it's getting a bit messy. – Chetan Arvind Patil Feb 04 '18 at 20:36
  • 2
    @ChetanArvindPatil ; at the read in stage, you could add the file names with `tables <- lapply(files, function(x) cbind(read.csv(x, header = TRUE), id=x))`, or a numeric with `tables <- lapply(seq_along(files), function(x) cbind(read.csv(files[x], header = TRUE), id=x))` ; or using `data.table` in the list as in the answer above : `names(tables) <- files ; data.table::rbindlist(tables, idcol=TRUE)` – user20650 Feb 04 '18 at 20:51
1

In more contemporary plyr approach:

files <- list.files(...)
data <- adply(files, 1, read.table)

(it's saturday afternoon: untested code, but the approach is fine)

Paul Lemmens
  • 595
  • 5
  • 14