-2

I have 30 csv-files, named 101.csv, 102.csv etc. with four columns each, but with a varying number of rows. My intention is to calculate the mean and the median value of the fourth column in each csv file.

I started by making a ListofDataFrames with:

listOfDataframes <- lapply(paste0(101:130, ".csv"), read.csv)

It looks like this:

[[1]]              
1               contig02534_1_120507-bin0\t477\t585\t50      
2               contig02534_1_120507-bin0\t585\t2695\t0               
3               contig06975_1_120507-bin0\t0\t732\t100
...

[[2]]
...

I would like to end up with one new data table summarizing 1) the mean value of column four and 2) the median value of column four. And this would need to be for each data frame separately, like this for instance:

          mean     median
[[1]]     75       50
[[2]]     65       100

I have tried different approaches posted here but can't get them to work as I want. Any help would be highly appreciated!

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    I think you need to use `sep="\t"` – akrun Jan 26 '16 at 15:55
  • [see here for an explanation on how to read and combine several files into one](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918); after that you can calculate the desired values, see: [one](http://stackoverflow.com/questions/32795456/how-to-summarize-a-data-frame-into-a-new-one-that-tells-means-of-separate-levels/32795497#32795497) & [two](http://stackoverflow.com/questions/12064202/using-aggregate-for-multiple-aggregations/34240880#34240880) – Jaap Jan 26 '16 at 16:05

1 Answers1

0

My usual approach to this goes something like

library(dplyr)
lapply(101:130), 
                  function(x){
                    D <- read.csv(paste0(x, ".csv", sep = "\t")
                    D$set <- x
                 ) %>%
  do.call("rbind", .) %>%
  setNames(c("col1", "col2", "col3", "col4", "set")) %>%
  group_by(set) %>%
  summarise(mean = mean(col4),
            median = median(col4))

Your example wasn't reproducible, so this is untested.

Benjamin
  • 16,897
  • 6
  • 45
  • 65