R, analyzing a data set with a large parameter space and replicates

Question

I've run experiments whereby I use a parameter combination, collect the average forces and torques (in the x,y, and z directions). I do four replicates for each parameter combo, and I have 432 parameter combinations in total.

The actual dataset is a bit too big to include here, so I've made a subset for testing purposes and uploaded it to dropbox, along with the relevant R script.

Here is a heavily parsed version:

> data2[1:20,1:8]
# A tibble: 20 x 8
   `Foil Color` `Flow Speed (rpm)` `Frequency (Hz)` StepTime Maxpress Minpress `Minpress Percentage`      FxMean
         <fctr>             <fctr>           <fctr>   <fctr>   <fctr>    <int>            <fctr>       <dbl>
 1        Black                  0             0.25      250       50        0                 0 0.014537062
 2        Black                  0             0.25      250       50        0                 0 0.014870256
 3        Black                  0             0.25      250       50        0                 0 0.013180870
 4        Black                  0             0.25      250       50        0                 0 0.013448804
 5        Black                  0             0.25      250       50        3              0.05 0.012996979
 6        Black                  0             0.25      250       50        3              0.05 0.012115166
 7        Black                  0             0.25      250       50        3              0.05 0.012427347
 8        Black                  0             0.25      250       50        3              0.05 0.012561253
 9        Black                  0             0.25      250       50        5               0.1 0.012480644
10        Black                  0             0.25      250       50        5               0.1 0.011603403
11        Black                  0             0.25      250       50        5               0.1 0.011427116
12        Black                  0             0.25      250       50        5               0.1 0.011545803
13        Black                  0             0.25      250       50       13              0.25 0.009891865
14        Black                  0             0.25      250       50       13              0.25 0.008465604
15        Black                  0             0.25      250       50       13              0.25 0.009089619
16        Black                  0             0.25      250       50       13              0.25 0.008560160
17        Black                  0             0.25      250       75        0                 0 0.025101186
18        Black                  0             0.25      250       75        0                 0 0.023611920
19        Black                  0             0.25      250       75        0                 0 0.026276007
20        Black                  0             0.25      250       75        0                 0 0.026593895

I am trying to group the data by the parameter combinations and calculate the average FxMean, sd, and se, for that group of 4 replicates.

I have tried to follow tutorials and other examples where people try to summarize the data (example), but it doesn't work for me. It normally spits out an array that looks nothing like what I need.

For example:

fx_data2 <- ddply(data_csv, c(data_csv$`Frequency (Hz)`,data_csv$`Flow Speed (rpm)`, data_csv$StepTime, data_csv$Maxpress, data_csv$`Minpress Percentage`), summarise,
N    = length(data_csv$FxMean),
mean = mean(data_csv$FxMean),
sd   = sd(data_csv$FxMean),
se   = sd / sqrt(N)

)

fx_data3 <- summaryBy(FxMean ~freq + foilColor+maxP+minPP, data=data_csv, FUN=c(length, mean, sd))

fx_data2 looks just...abyssmal.

head(fx_data2)
....
Foil Color.2530 Foil Color.2531 Foil Length.2512 Foil Length.2513 Foil 
Length.2514 Foil Length.2515 Flow Speed (rpm).2544 Flow Speed (rpm).2545
Flow Speed (rpm).2546 Flow Speed (rpm).2547 Frequency (Hz).800 Frequency 
(Hz).801 Frequency (Hz).802 Frequency (Hz).803 Foil Color.2532 Foil Color.2533
Foil Color.2534 Foil Color.2535 Foil Length.2516 Foil Length.2517 Foil 
Length.2518 Foil Length.2519 Flow Speed (rpm).2548 Flow Speed (rpm).2549
Flow Speed (rpm).2550 Flow Speed (rpm).2551 Frequency (Hz).804 Frequency 
(Hz).805 Frequency (Hz).806 Frequency (Hz).807 Foil Color.2536 Foil Color.2537

I mean. I have no idea what's going on with that. The dimensions are 24x8724. Just...what.

and fx_data3 looks like this:

> fx_data3
  FxMean.length FxMean.mean  FxMean.sd
 1          1744  0.01379712 0.01423244
>

Ideally, these would look like the original data set, but each parameter combination is compressed to a single line, and the values on the far right would be the mean, sd, and se for the FxMean, FxStDev, etc. for the four replicates.

I've been struggling with this for a few days. I'd greatly appreciate some help.

Thank you, Zane

Please provide code we can copy and paste to create a small dataset to tackle. Also, include output as code chunks instead of as images (in fact, my work's proxy blocks imgur for some reason). See [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for other things to help us help you. — Nathan Werth, Dec 06 '17 at 15:49
Hi @NathanWerth, I've added the code and a subsetted data file to a dropbox folder (see the edited post above). Does that work? — zaneywolf, Dec 06 '17 at 16:21
Dropbox is also blocked by my work. If the entire dataset is necessary to the problem, then this is just bad luck for me. But it can help to try recreating the problem with a small toy dataset. You might figure out what's wrong yourself when things are simpler. — Nathan Werth, Dec 06 '17 at 16:44
@NathanWerth alrighty...wait, you can't use dropbox at work? That's weird. Anyways, how's ^that? — zaneywolf, Dec 06 '17 at 16:50

score 0 · Answer 1 · answered Dec 06 '17 at 15:47

0

Which parameters you want to group_by? Just insert them in the code snippet below in place of param1, param2 etc

You could use dplyr:

library(dplyr)

data %>% 
  group_by(param1, param2, param3) %>% 
  summarise(mean = mean(FxMean), 
            sd = sd(FxMean),
            se = sd/n())

answered Dec 06 '17 at 15:47

D Pinto

871
9
27

I already tried that. Produced the same output as fx_data3: > fx_data3 FxMean.length FxMean.mean FxMean.sd 1 1744 0.01379712 0.01423244 – zaneywolf Dec 06 '17 at 16:08

score 0 · Accepted Answer · answered Dec 06 '17 at 17:14

url <- "https://www.dropbox.com/sh/vhf39uz4pol7sgl/AAAJ9Fr6OTEIgb_ZeSno-X5ea?dl=1"
download.file(url, destfile = "from-SO-via-dropbox")
unzip("from-SO-via-dropbox")
df <- readr::read_csv("Data_subset.csv")

library(dplyr)

df %>% 
  group_by(`Frequency (Hz)`, `Foil Color`, StepTime, Maxpress, `Minpress Percentage`) %>% 
  summarise_at(vars(FxMean), funs(N = length, mean, sd, se = sd(.) / sqrt(N)))

# # A tibble: 13 x 9
# # Groups:   Frequency (Hz), Foil Color, StepTime, Maxpress [?]
#    `Frequency (Hz)` `Foil Color` StepTime Maxpress `Minpress Percentage`     N        mean           sd           se
#               <dbl>        <chr>    <int>    <int>                 <dbl> <int>       <dbl>        <dbl>        <dbl>
#  1             0.25        Black      250       50                  0.00     4 0.014009248 0.0008206156 0.0004103078
#  2             0.25        Black      250       50                  0.05     4 0.012525186 0.0003658681 0.0001829340
#  3             0.25        Black      250       50                  0.10     4 0.011764241 0.0004832082 0.0002416041
#  4             0.25        Black      250       50                  0.25     4 0.009001812 0.0006538297 0.0003269149
#  5             0.25        Black      250       75                  0.00     4 0.025395752 0.0013514463 0.0006757231
#  6             0.25        Black      250       75                  0.05     4 0.020794212 0.0028703242 0.0014351621
#  7             0.25        Black      250       75                  0.10     4 0.018409500 0.0037305138 0.0018652569
#  8             0.25        Black      250       75                  0.25     4 0.016193536 0.0016200530 0.0008100265
#  9             0.25        Black      250      100                  0.00     4 0.035485324 0.0052513208 0.0026256604
# 10             0.25        Black      250      100                  0.05     4 0.050097709 0.0024123653 0.0012061827
# 11             0.25        Black      250      100                  0.10     4 0.051378181 0.0049857712 0.0024928856
# 12             0.25        Black      250      100                  0.25     4 0.039374874 0.0031421884 0.0015710942
# 13             0.50        Black      250       50                  0.00     2 0.014778494 0.0004683882 0.0003312005

What. Can I ask you to explain why changing summarise() to summarise_at() and adding vars() around the target column makes it work? I don't really understand the differences between the previous version and yours. It seems like it should be doing the same thing. — zaneywolf, Dec 06 '17 at 18:02
It should. ```df %>% group_by(`Frequency (Hz)`, `Foil Color`, StepTime, Maxpress, `Minpress Percentage`) %>% summarise(mean = mean(FxMean), sd = sd(FxMean), se = sd/n())``` works for me. — Aurèle, Dec 06 '17 at 18:09
Do you mean "no, it doesn't"? Sorry, English is not my first language, I might be misunderstanding here. — Aurèle, Dec 06 '17 at 18:25
It's difficult to diagnose, what went wrong with previous attempts. If you really need to know, you'd have to make them fully reproducible with code that can be copy-pasted and ran as-is, plus the result of `dput(fx_data3)`, possibly in a separate question. If I had to take a wild guess, I'd say you used double quotes `"` instead of backticks ` . — Aurèle, Dec 06 '17 at 18:44
That's not it. I made sure not to use double quotes. Ah well. Thank you very much for your help, Aurèle! — zaneywolf, Dec 06 '17 at 18:58

R, analyzing a data set with a large parameter space and replicates

2 Answers2