I'm working with R and have a quite cascaded list of data wherefrom I would like to extract the same variable of every data frame. Here is an example (simplified from the original, I hope that is not too confusing) for one imported .csv file:
Temp(A); Density(B); Velocity(C)
21,54; 0,7; 1486,46
20,87; 0,76; 1484,42
20,34; 0,81; 1482,8
19,61; 0,81; 1480,5
# .csv files imported with:
data_files <- list.files("D:\\My\\data\\pathway")
The code I used to create a list from 19 data frames is as follows:
lst1 <- map(data_files, ~ {
data1 <- read.csv2(paste0("D:\\My\\data\\pathway\\", .x))
df.sum <- data1 %>%
select(Temperature(A), Density(B), Velocity(C)) %>%
summarise_each(funs(min = min, # in the example Min(1)
q25 = quantile(., 0.25), # Max(2)
median = median, # Mean(3)
q75 = quantile(., 0.75), # St.Dev.(4)
max = max,
mean = mean,
sd = sd))
df.stats.tidy <- df.sum %>% gather(stat, val) %>%
separate(stat, into = c("var", "stat"), sep = "_") %>%
spread(stat, val) %>%
select(var, min, q25, median, q75, max, mean, sd)
return(df.stats.tidy)
})
lst1
The output list looks like that:
This is how it is listed when I open the whole list. When I open the specific table of a single dataset, the table is transposed:
How can I extract, for example, the temperature for every dataset to create a plot or do statistical tests?
I tried a few simple methods and was able to extract single values from a single data set. Thus, I am able to extract, for example, the mean value for every parameter of dataset2. However, this is not quite what I need, for I need the same value for the same parameter of all the different datasets. Does anyone have an idea of a simple way to decipher the order of this list? I can't find out how exactly the parapeters are defined.
P.s. here the dput() results:
> dput(lst1[1:2])
list(structure(list(var = c("Conduct.mS.cm.", "Depth.m.", "Salinity.psu.",
"Sound.Velocity.m.sec.", "Temp.C."), min = c(0, -1.19, 0, 1402.98,
-1.48), q25 = c(0.01, -0.91, 0.01, 1412.835, -0.51), median = c(9.225,
-0.78, 9.885, 1421.785, 0.85), q75 = c(25.575, 39.9725, 31.0825,
1440.7175, 2.09), max = c(26.28, 143.76, 32.02, 1453.52, 11.81
), mean = c(11.6531756756757, 23.0201351351351, 13.9187162162162,
1426.98621621622, 1.26290540540541), sd = c(11.8954355870503,
38.217076230762, 14.4467518784427, 14.8016328574063, 2.53744347569587
)), class = "data.frame", row.names = c(NA, -5L)), structure(list(
var = c("Conduct.mS.cm.", "Depth.m.", "Salinity.psu.", "Sound.Velocity.m.sec.",
"Temp.C."), min = c(0, -2.17, 0, 1401.46, -1.44), q25 = c(0,
-1.14, 0, 1404.25, 0.0125), median = c(0.13, -1.08, 0.115,
1413.215, 0.49), q75 = c(25.035, 6.3225, 30.3525, 1440.2625,
1.53), max = c(26.35, 129.54, 32.11, 1486.46, 21.54), mean = c(7.78810344827586,
17.3289655172414, 9.34528735632184, 1424.01396551724, 2.13511494252874
), sd = c(11.6263191741139, 36.9663620576755, 14.0549552563496,
22.6029377552219, 5.01839273011273)), class = "data.frame", row.names = c(NA,
-5L)))