0

I have a dataframe looking like this (using reshape2::cast and merge):

time days treatment extrafactor1 extrafactor2 extrafactor3 animal1  animal2 animal3
10  83  control B   water   2   2   67  40
10  83  control B   water   3   50  67  39
10  83  control A   water   3   22  80  63
10  83  control A   water   2   40  40  100
10  83  treated A   water   3   40  69  92
10  83  treated A   water   1   64  56  6
10  83  treated A   water   2   90  67  52
10  83  treated B   water   2   14  36  77
10  83  treated B   water   3   41  83  55
10  83  treated B   water   1   66  31  51
11  86  control B   water   1   99  100 10
11  86  control B   water   2   23  27  22
11  86  control A   water   3   57  10  65
11  86  control A   water   1   60  2   49
11  86  control A   water   2   23  14  44
11  86  control B   water   3   97  45  20
11  86  treated B   water   2   71  15  24
11  86  treated B   water   3   49  55  63
11  86  treated A   water   3   54  88  27

and I would like to substract the values of the different animals of the control samples from the treated samples. Of course the substraction shall take place where the levels of the other factors match, so the animal1-value of "11_86_treated_A_water_3" should by reduced by the animal1-value of "11_86_control_A_water_3", and this for each animal. I've been trying some things with plyr like

df2 <- ddply(df, .(time,days,treatment,extrafactor1,extrafactor2,extrafactor3), transform, animal1 = animal1-animal1[treatment=="control"])

but it gave me a lot of NAs and I'm sure there was information missing to adequately execute what I would like to have. There are actually 100s of animals.

My try is adapted from here, but there are less input variables and less columns to perform on: Easiest way to subtract associated with one factor level from values associated with all other factor levels and here: R ddply with multiple variables

It would also be possible to wait until the reshaping the table into the long format for ggplot, if that makes things easier?

Do you have any suggestions for me?

crazysantaclaus
  • 613
  • 5
  • 19
  • Hey! Have a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to create a reproducible example. That makes it easiert to help. – elevendollar Sep 05 '17 at 14:53
  • what is your expected output – BENY Sep 05 '17 at 15:27

1 Answers1

1

Not the most elegant, but you could create a new column called group_string which is a concatinated string of all the different factors like you already mentioned in your example. But whether it is 'control' or 'treated' would be the last argument. So for example instead of

"11_86_treated_A_water_3" and "11_86_control_A_water_3"

you would have

"11_86_A_water_3_treated" and "11_86_A_water_3_control"

Then you could run a loop through all unique strings without the treated/control substring, e.g. one unique string is "11_86_A_water_3_" and for each one of those substract the row that has "control" in the group_string from the row that has "treated" in group_string.

EDIT: Ok, just had another idea. Group by all factors except for treatment (time, days, extrafactor1, extrafactor2, extrafactor3) which should leave you with two rows for each subgroup. Then use diff() to calculate the difference between those two rows for each subgroup.

elevendollar
  • 1,115
  • 10
  • 19
  • Hey Elevendollar, thanks a lot for your input...I've did it several times in similar cases with the concatenating, I just thought there would be a more elegant way ;-) I'll let you know when it worked – crazysantaclaus Sep 05 '17 at 16:09
  • Hey @crazysantaclaus. Sure thing. Have a look at the reproducible example link. If you edit your question accordingly I am sure someone will be able to help. – elevendollar Sep 06 '17 at 08:26