Comparing groups with different lengths in a tibble

Question

I'm looking at the effects of drought on plants and for that I would need to compare data from before, during and after the drought. However, it has proven to be difficult to select those periods from my data, as the length of days varies. As I have timeseries of several years with daily resolution, I'd like to avoid selecting the periods manually. I have been struggling with this for quite some time and would be really greatful for any tips and advice.

Here's a simplified example of my data:

myData <- tibble(
  day = c(1:16),
  TWD = c(0,0,0,0.444,0.234,0.653,0,0,0.789,0.734,0.543,0.843,0,0,0,0),
  Amp = c(0.6644333,0.4990167,0.3846500,0.5285000,0.4525833,0.4143667,0.3193333,0.5690167,0.2614667,0.2646333,0.7775167,3.5411667,0.4515333,2.3781333,2.4140667,2.6979333)
)

In my data, TWD > 0 means that there is drought, so I identified these periods.

myData %>%
  mutate(status = case_when(TWD > 0 ~ "drought", 
                           TWD == 0 ~ "normal")) %>%
{. ->> myData}

I used the following code to get the length of the individual normal and drought periods

myData$group <- with(myData, rep(seq_along(z<-rle(myData$status)$lengths),z))
with(myData, table(group, status))     

     status
group drought normal
    1       0      3
    2       3      0
    3       0      2
    4       4      0
    5       0      4

Here's where I get stuck. Ideally, I would like to have the means of Amp for each drought period and compare them to mean of normal period from before and after the drought, and then move to the next drought period. How can I compare the days of e.g. groups 1, 2 and 3? I found a promising solution here Selecting a specific range of days prior to event in R where map(. , function(x) dat[(x-5):(x), ]) was used, but the problem is that I don't have a fixed number of days I want to compare as the number of days depends on the length of the normal and drought periods.

I thought of creating a nested tibble to compare the different groups like here Compare groups with each other with

tibble(value = myData,
    group= myData$group %>%
    nest(value))

but that creates an error which I believe is because I'm trying to combine a vector and not a tibble.

how do you want to compare the means of the periods? e.g. if mean in period 1 is 0.5, period 2 is 0.4, period 3 is 0.44, how will you do the comparison? — user2474226, Mar 16 '20 at 12:04
@user2474226 I'd like to compare all the groups to each other separately, e.g. is mean of period 1 larger than period 2? Is it larger than period 3? Is period 2 larger than period 3? My questions are: is the mean of Amp smaller during drought and is the after drought Amp as large as the before drought Amp. I hope that clarifies what I'm trying to do. — LaHN, Mar 16 '20 at 12:30
Whoa. Some interesting stuff going here. Why are you doing `{. ->> myData} ` instead of `-> myData`? — Adam Sampson, Mar 16 '20 at 13:31
Also, why are you putting a tibble as a column in another tibble? Wouldn't it be easier to do `myData %>% group_by(group) %>% nest(value)`? — Adam Sampson, Mar 16 '20 at 13:33
FYI: your example data doesn't have anything about group or days or time so it is hard for us to replicate. — Adam Sampson, Mar 16 '20 at 13:34
@AdamSampson, thanks for your comments! I used `{. ->> myData}`to save the results of the ´mutate` in the tibble. I'm relatively new to R, so I have only seen this way to do it. I'll gladly use simpler commands to do that. As for putting a tibble as a column, I saw that in an example and though to try it out. I tried your option, but for me it gave an error `Error: myData must evaluate to column positions or names, not a list`. So there's somethings I need to figure out.. I added the date into the tibble. Groups are created with the code in the question, I have no additional data on them. — LaHN, Mar 16 '20 at 14:22

score 0 · Accepted Answer · answered Mar 16 '20 at 12:43

0

One possibility would be to use the pairwise Wilcoxon test to compare the means of each group (though, to be honest, I'm not an expert on whether the Wilcoxon is appropriate for this data):

pairwise.wilcox.test(myData$Amp, myData$group, p.adjust.method = 'none', alternative = 'greater')

The column and row indices are the groups, and in this instance you know that the even-numbered groups are the 'drought' periods.

You may need to correct for multiple comparisons (by investigating the p.adjust.method parameter).

answered Mar 16 '20 at 12:43

user2474226

1,472
1
9
9

Thank you! I think this is a partial solution to my problem. It would help with comparing the different groups statisticall. I would also be interested in comparing them visually. And I think for that I would also need to the daily values to be able to plot them. – LaHN Mar 16 '20 at 14:26
Did you try this? `myData %>% group_by(group) %>% mutate(meanAmp = mean(Amp))`? That will get you the average `Amp` for each group. – user2474226 Mar 16 '20 at 14:50
I actually haven't! Thank you! – LaHN Mar 16 '20 at 14:54

Comparing groups with different lengths in a tibble

1 Answers1