0

I am working on a function that extract the contents of several CSV files and put in one data frame and then get the mean of a specific component of that data frame specified by the user

> airpollut <- function(pathway, pollutent, id = 1:322){
+   airpollutionfiles <- list.files(path = pathway, full.names = TRUE, pattern = ".csv")
+   totalvalues <- list()
+   for(i in id){
+    filereading <- read.csv(airpollutionfiles[i])
+    totalvalues[[i]] <- filereading
+   }
+   finaldata1 <- data.frame(rbind(totalvalues))
+   mean(finaldata1[[pollutent]])
+   }

but when I run the function I will get the error message:

> airpollut("finaldata", "sulfate")
[1] NA
Warning message:
In mean.default(finaldata1[[pollutent]]) :
  argument is not numeric or logical: returning NA

so I tried to check out the output of that dataframe binding and I removed the mean function from the function I am creating and put head() instead:

airpollut <- function(pathway, pollutent, id = 1:322){
  airpollutionfiles <- list.files(path = pathway, full.names = TRUE, pattern = ".csv")
  totalvalues <- list()
  for(i in id){
   filereading <- read.csv(airpollutionfiles[i])
   totalvalues[[i]] <- filereading
  }
  finaldata1 <- data.frame(rbind(totalvalues))
  head(finaldata1)
}

the output is just the data stacked next to each other like a vector and my computer crushes. can you please tell me how to combine the rows from all the files? without using functions that are not in r base.

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • https://stackoverflow.com/questions/2851327/combine-a-list-of-data-frames-into-one-data-frame-by-row – AdroMine Feb 11 '22 at 08:41
  • @AdroMine Thanks for your comment! I tried rbindlist but it says that this function doesn't exist, I don't want to use dplyr package in this. – Hasan Jamil Feb 11 '22 at 09:17
  • Try `do.call(rbind,totalvalues)` instead of `data.frame(rbind(totalvalues))` – Waldi Feb 11 '22 at 09:35
  • 1) use lapply instead of the for loop. it will return a list instead of appending every result. 2) rbindlist is a data.table or tidyverse/dplyr function, which i can only recommend. – D.J Feb 11 '22 at 10:31

1 Answers1

1

A minimal reproducible example to facilitate understanding of the problem:

l <-list()
l[[1]] <- mtcars
l[[2]] <- mtcars
mean(as.data.frame(rbind(l))[[1]])
[1] NA
Warning message:
In mean.default(as.data.frame(rbind(l))[[1]]) :
  argument is not numeric or logical: returning NA

You could use do.call:

mean(do.call(rbind,l)[[1]])
[1] 20.09062
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thansks a lot it worked, but I didn't understand why you need do.call why not just rbind – Hasan Jamil Feb 13 '22 at 04:50
  • Have a look at `rbind(l)` : as noted in comments to your post, when `l` is a list of dataframes, `data.table::rbindlist` does what you expect. – Waldi Feb 13 '22 at 06:24