take variance of a column from multiple csv files

Question

I need to take 3rd column for each one of 50 csv files that I have and get variances of them in R.

files <- list.files(path="path\\to\\csv", pattern="*.csv", full.names=T, recursive=FALSE)

lapply(files, function(x) {
  t <- read.csv(x, header=F) # load file
  # apply function
  out <- var(t[3])

  out
  # write to file
  #write.csv(out, "path\\to\\dir\\variances.csv", sep="\t", quote=F, row.names=F, col.names=T)
})

This is what I have so far, and I need some help on how I can use from 2nd row to the last row for each csv files to get variances and only 3rd columns.

Also, if I can write a dataframe with each file's name without ".csv" as column names and their variances as values in a csv file. Basically it will be a 1x50 data frame

Thank you for your help

Use `var(t[-1,3])` to calculate the variance of column three without the first row. Use `sub('.csv', '', x)` to remove the '.csv' from the filename. — sieste, Apr 15 '18 at 19:48
@YunTaeHwang: see this answer for hint https://stackoverflow.com/questions/3397885/how-do-you-read-in-multiple-txt-files-into-r/48105838#48105838 — Tung, Apr 15 '18 at 20:12

Len Greski · Answer 1 · 2020-11-22T00:10:19.540

Here is a complete, working example using Pokémon statistics from pokemondb.net. We'll download the data, extract to a folder of 8 csv files (one for each of the first 8 generations of Pokémon) and then read each file, subsetting to the 8th column and rows 2 - N.

We'll calculate variance on each of these columns, then use unlist() to combine the stats in a single vector.

    download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/PokemonData.zip",
               "pokemonData.zip",
               method="curl",mode="wb")
unzip("pokemonData.zip",exdir="./pokemonData")

thePokemonFiles <- list.files("./pokemonData",
                              full.names=TRUE)
varianceList <- lapply(thePokemonFiles,function(x) {
     # read data and subset to 8th column, drop first row
     data <- read.csv(x)[-1,8]
     var(data,na.rm=TRUE)
     })
# unlist to combine into a vector
unlist(varianceList)

...and the output:

> unlist(varianceList)
[1]  716.7932  812.0668  968.6125  915.8592  934.8132 1607.4362 1049.9671
[8] 1016.2672

NOTE: on Windows, use method="wininet" argument in download.file().

take variance of a column from multiple csv files

1 Answers1