0

I'm relatively new to R therefore apologies if there is an obvious answer to this question.

Basically, I'm analysing samples by FT-IR in duplicates. The data are given in an individual .csv file for each reading. The csv files have two columns (wavelength and absorbance), and around 3500 rows due to the varying wavelengths. As the measurements are made in duplicates, there are 30 separate .csv files (2 for each of the 15 samples analysed), called "1.csv","2.csv" ... "30.csv". "1.csv" and "2.csv" are both duplicates of sample 1, "3.csv" and "4.csv" of sample 2 and so on.

I need to use the mean absorbance value for each duplicate in the ChemoSpec package. I could obviously calculate this in excel, however this would be time consuming in the future when I have more samples to analyse. Is there a way of calculating these means in R?

Here is a simplified reproducible example, where csv1 and csv2 are both replicates of the sample sample:

wavelength <- c(500, 550, 600)
absorbance <- c(2, 4, 3)

csv1 <- data.frame(wavelength, absorbance)
csv2 <- data.frame(wavelength, absorbance)

mean <- (csv1+csv2)/2

I think I need to read the csv files into R, before calculating the mean values of each sample in a similar way to the above, however I am not sure how to do this.

Thanks.

G Jones
  • 43
  • 4
  • could you please post a reproducible example as outlined here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa more specifically: how do the csv files look like? – MartijnVanAttekum May 22 '18 at 13:33
  • Apologies, I have edited the question. Hope this clarifies my point. – G Jones May 22 '18 at 14:06
  • As a general suggestion, I would avoid naming files with digits. If you have only 30 files, I would rename them by hand to `Sample1a.csv`, `Sample1b.csv`, `Sample2a.csv` etc. This can save some grief later, and is clearer. I would also suggest you read them all into `ChemoSpec` directly, rather than averaging them first. Then you can inspect all of them easily. Depending upon what kind of analysis you plan, you might want to keep your duplicates. Disclaimer: I am the author of `ChemoSpec.` – Bryan Hanson May 25 '18 at 00:30

1 Answers1

0

assuming you have named your columns in the csv files, you would typically do something like

file_names <- list.files(pattern = ".csv")
all_data <- lapply(file_names, read.csv, header = TRUE)

to read all your csv files to a list. Then, if you are only interested in absorbance, you could do

all_data_abs <- sapply(all_data, function(df) df$absorbance)
all_data_abs <- as.data.frame(t(all_data_abs))

but this assumes the number of rows is the same for each file. Is that the case? Define your replicate groups using

no_replicates <- 2
all_data_abs$grps <- rep(1:(nrow(all_data_abs)/no_replicates), 
each = no_replicates)

and use summarize_all from dplyr to calc mean per group:

library(dplyr)
all_data_abs %>% group_by(grps) %>% summarize_all(mean)

but it is a bit of guessing without being able to see the original files

MartijnVanAttekum
  • 1,405
  • 12
  • 20
  • All files have the same number of rows (3528). For some reason I cannot define my replicate groups, I get the error message: "Error in `$<-.data.frame`(`*tmp*`, grps, value = c(1L, 1L)) : replacement has 2 rows, data has 1 " – G Jones May 22 '18 at 15:16
  • hard to tell if that is caused by the group assignment or something upstream. What is `dim(all_data_abs)`? – MartijnVanAttekum May 22 '18 at 15:43
  • dim(all_data_abs) is 1 30 . – G Jones May 22 '18 at 16:44