Years of lurking and benefiting from community knowledge here, but my first time posting, so thanks in advance. This question is distinct from the answers I've been able to find that focus on sum or concatenate by groups/factors. I have a data set from a vegetable field uniformity trial. I need to look at the power of different experimental designs and analyses approaches (so what I am actually interested in is the error resulting from different approaches to this same data set). I have a basic experimental unit of a single plot, and I need to combine those plots in different combinations, i.e. two adjacent plots in the same row or column, the entirety of a single row, 4 plots in an adjacent row/column combo... The tricky part here is that I need a plot to only be combined once, so it can't be +/- 1.
So for instance, a data frame might look like this, with many additional locations, crops, rows and weeks of harvest data:
Week | Plot | Row | Column | Rep | Variety | Market_Ct | Market_Wt | Unmark_Wt |
---|---|---|---|---|---|---|---|---|
34.00 | 101 | 1 | 1 | 1 | VarB | 15 | 1174 | 671 |
34.00 | 102 | 1 | 2 | 1 | VarA | 32 | 2450 | 136 |
34.00 | 103 | 1 | 3 | 1 | VarD | 3 | 234 | 127 |
34.00 | 104 | 1 | 4 | 1 | VarE | 5 | 440 | 657 |
34.00 | 105 | 1 | 5 | 1 | VarC | 11 | 882 | 430 |
34.00 | 106 | 1 | 6 | 1 | VarF | 22 | 1749 | 683 |
34.00 | 201 | 2 | 1 | 2 | VarE | 11 | 834 | 262 |
34.00 | 202 | 2 | 2 | 2 | VarF | 18 | 1266 | 863 |
34.00 | 203 | 2 | 3 | 2 | VarA | 6 | 513 | 317 |
34.00 | 204 | 2 | 4 | 2 | VarC | 15 | 899 | 356 |
34.00 | 205 | 2 | 5 | 2 | VarB | 7 | 550 | 261 |
34.00 | 206 | 2 | 6 | 2 | VarD | 16 | 1220 | 755 |
As an example, I will need to combine plot 101 and 102, 103 and 104, etc and sum the yield data. The initial output might look like the following if you're stupid like me and can't figure out how to merge/sum adjacent plots in one go:
Week | Plot | Column | Row | Rep | Variety | Market_Ct | Market_Wt | Unmark_Wt |
---|---|---|---|---|---|---|---|---|
34.00 | 101 | 1 | 1 | 1 | VarA | 15 | 1174 | 671 |
34.00 | 101 | 2 | 1 | 1 | VarA | 32 | 2450 | 136 |
34.00 | 102 | 3 | 1 | 1 | VarB | 3 | 234 | 127 |
34.00 | 102 | 4 | 1 | 1 | VarB | 5 | 440 | 657 |
34.00 | 103 | 5 | 1 | 1 | VarC | 11 | 882 | 430 |
34.00 | 103 | 6 | 1 | 1 | VarC | 22 | 1749 | 683 |
34.00 | 201 | 1 | 2 | 2 | VarB | 11 | 834 | 262 |
34.00 | 201 | 2 | 2 | 2 | VarB | 18 | 1266 | 863 |
34.00 | 202 | 3 | 2 | 2 | VarC | 6 | 513 | 317 |
34.00 | 202 | 4 | 2 | 2 | VarC | 15 | 899 | 356 |
34.00 | 203 | 5 | 2 | 2 | VarA | 7 | 550 | 261 |
34.00 | 203 | 6 | 2 | 2 | VarA | 16 | 1220 | 755 |
But ideally I would make those pairwise combinations by row/column and sum/merge in one go.
In another iteration I will also need to combine 101 and 201, 102 and 202, or in another iteration 101 and 102 and 103 vs 104 and 105 and 106, etc... In each instance I will then need to re randomize an appropriate number of varieties (treatments), and reps - i.e in the example output above when combining two plots I will need to assess with half the number of varieties and same number of reps, and with same number of varieties and half the reps. Then I think I can figure out the operations for applying/randomizing new names, and then group_by Plot and Week to merge with sum, cumsum, etc., but it's the wrangling part that really has me stumped.
So far I've only been able to figure out how to do this manually in excel or essentially manually in R by calling on specific plots. But there's gotta be a better way! Is there a clever way to do this without calling specific plots and basically combining stuff manually using df %>% group_by(week), but instead do it based on combinations of row and column?
I'm looking for a pipe-friendly approach here since I have to do this in lots of different combinations for different crops and locations, and then run different analyses for each layout. I'm only familiar with using r and Excel so any suggestions for either of those is welcome.
Thank you people of StackOverflow!