Issues with dcast function (reshape2) - three variable combination

Question

I am using reshape2 package to shape my data and use it for t-test. For me it is easier to visualize the data in separate columns. I have three treatment combinations where "wat" is nested within "spp" and "ins" is nested within water. My demo table contains 3 response variable namely "tyr", "esc" and "esc_R". I would be interested in seeing how ins influence response -> "tyr" in "spp" -> Bl, with treatment "wat" -> High (just an example).

Here is my data: demo.data

## Use orderBy function to sort data
library(doBy)
demo <- orderBy(~spp+wat+ins, data = demo)
## Create an unique data frame for a specific variable
df.bl.ins.1 <- demo[demo$spp == "Bl", c(1:3, 4)]
df.bl.ins.2 <- df.bl.ins.1[df.bl.ins.1$wat == "High", ]

And then I am having trouble executing dcast function.

df.bl.ins.tmp <- dcast(df.bl.ins.2, spp + wat ~ ins, value.var = "tyr")

I have found interesting information in the following threads

Dason's suggestion - which works really well with ToothGrowth demo dataset. Unfortunately, when the table has multiple treatments (more than 2) the solution did not remain simple. I agree with Maiasaura's suggestions that creating an unique variable is the key to this problem. However, I am having hard time understanding what function(x) does or how to use it in my table.

Any help in this regard is much appreciated.

In addition, if you have alternative suggestions to do t-test without manipulating the original data frame (demo), I will be excited to hear about it.

Thanks in advance.

Edit Here is what I am expecting, for "tyr". In the following format I desire to compare "No" vs. "Yes" using a t-test.

spp wat ins No  Yes
Bl  High    No  0.3036  0.1987
Bl  High    No  0.2577  0.1112
Bl  High    No  NA  0.199
Bl  High    No  0.3299  0.1886
Bl  High    No  0.3301  0.2332

What specific trouble are you having? Also, there may be a typo in your question: what object are you using `dcast()` on? You've written "df.bls.ins.2", but object doesn't exist---"df.bl.ins.2" does. — A5C1D2H2I1M1N2O1R2T1, Oct 19 '12 at 18:06
@mrdwab: Absolutely correct! Sorry about the typo. It should be df.bl.ins.2 not "bls". Edited the original post. I get this error: `Aggregation function missing: defaulting to length`. — Sourav Chakraborty, Oct 19 '12 at 19:53
In Maiasaura's answer above, `function(x)` is an anonymous function that's used to aggregate the duplicate records and return a single value. Since your data contains five values for every combination of `spp + wat + ins` but the reshaped data.frame can only have one record for each combination, you have to aggregate. `sum()` and `mean()` are two possibilities, but you can get as crazy as you want inside of that `function(x){}`. — Matt Parker, Oct 19 '12 at 20:15
For example: `df.bl.ins.tmp <- dcast(df.bl.ins.2, spp + wat ~ ins, value.var = "tyr", fun.aggregate = mean, na.rm = TRUE)` works and gives you means of `ins` = Yes and `ins` = No. — Matt Parker, Oct 19 '12 at 20:17
@MattParker: Thanks for pointing that out. Forgot to mention that I tried `fun.aggregate = mean` a while ago and It works well. However the objective is to keep all five biological replicates (for treatment "ins") and not the mean for response variable "tyr". — Sourav Chakraborty, Oct 19 '12 at 20:40
Can you post an example of the output you expect for `df.bl.ins.tmp`? That might help us figure out how to help you. — A5C1D2H2I1M1N2O1R2T1, Oct 20 '12 at 08:58
@mrdwab: I have edited my post and showed desired output. Thanks. — Sourav Chakraborty, Oct 22 '12 at 00:37

nograpes · Accepted Answer · 2012-10-22T15:36:49.053

1

Perhaps I don't understand exactly what you want to do, but I think you could run linear regression directly on your data. In this way, you could do t-tests on whether the coefficients of your model were zero or not. I think this might suffice, and serve also to tease apart the effects of each of your independent variables. Here is an example:

summary(lm(tyr~spp+wat+ins,data=read.table('http://pastebin.com/raw.php?i=sR2MvBBA')))
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.286386   0.016500  17.356  < 2e-16 ***
sppMan      -0.159514   0.015811 -10.089  1.3e-11 ***
watLow      -0.005501   0.015858  -0.347 0.730861    
insYes      -0.066741   0.015858  -4.209 0.000185 ***

This will get you a t test for just the groups that you showed in your example:

t.test(tyr~ins,data=df[df$spp=='Bl' & df$wat=='High',])

edited Oct 22 '12 at 15:36

answered Oct 19 '12 at 18:46

nograpes

18,623
1
44
67

+1 Regression of some kind definitely makes sense, at least given my current understanding of the question. Which specific kind of regression depends on the nature of the dependent variables, though. – Matt Parker Oct 19 '12 at 20:09
@nograpes: The answer looks very logical. I am looking into it. I will report back as soon as I can to let you know if there is a problem. – Sourav Chakraborty Oct 19 '12 at 20:41
@nograpes: Degree of freedom is the problem. It is considering all biological replicates independent of treatments. At least thats what I am seeing. sppMan is compared against sppBl (20 biol replicate each). Treatments were not considered. Same thing for "Wat" High and low is compared independent of species. And then all ins is compared between yes and no, independent of species and water. I hope you got the idea of what I am talking about. – Sourav Chakraborty Oct 19 '12 at 21:12
@SouravChakraborty: I found your comment incomprehensible, but your update in the question gave me some better idea of what you want to do. – nograpes Oct 22 '12 at 13:56
@nograpes: `t.test(tyr~ins,data=df[df$spp=='Bl' & df$wat=='High',])` - This works, thanks. For some other purposes executing dcast will be important though. – Sourav Chakraborty Oct 24 '12 at 14:47

Issues with dcast function (reshape2) - three variable combination

1 Answers1