0

Years of lurking and benefiting from community knowledge here, but my first time posting, so thanks in advance. This question is distinct from the answers I've been able to find that focus on sum or concatenate by groups/factors. I have a data set from a vegetable field uniformity trial. I need to look at the power of different experimental designs and analyses approaches (so what I am actually interested in is the error resulting from different approaches to this same data set). I have a basic experimental unit of a single plot, and I need to combine those plots in different combinations, i.e. two adjacent plots in the same row or column, the entirety of a single row, 4 plots in an adjacent row/column combo... The tricky part here is that I need a plot to only be combined once, so it can't be +/- 1.

So for instance, a data frame might look like this, with many additional locations, crops, rows and weeks of harvest data:

Week Plot Row Column Rep Variety Market_Ct Market_Wt Unmark_Wt
34.00 101 1 1 1 VarB 15 1174 671
34.00 102 1 2 1 VarA 32 2450 136
34.00 103 1 3 1 VarD 3 234 127
34.00 104 1 4 1 VarE 5 440 657
34.00 105 1 5 1 VarC 11 882 430
34.00 106 1 6 1 VarF 22 1749 683
34.00 201 2 1 2 VarE 11 834 262
34.00 202 2 2 2 VarF 18 1266 863
34.00 203 2 3 2 VarA 6 513 317
34.00 204 2 4 2 VarC 15 899 356
34.00 205 2 5 2 VarB 7 550 261
34.00 206 2 6 2 VarD 16 1220 755

As an example, I will need to combine plot 101 and 102, 103 and 104, etc and sum the yield data. The initial output might look like the following if you're stupid like me and can't figure out how to merge/sum adjacent plots in one go:

Week Plot Column Row Rep Variety Market_Ct Market_Wt Unmark_Wt
34.00 101 1 1 1 VarA 15 1174 671
34.00 101 2 1 1 VarA 32 2450 136
34.00 102 3 1 1 VarB 3 234 127
34.00 102 4 1 1 VarB 5 440 657
34.00 103 5 1 1 VarC 11 882 430
34.00 103 6 1 1 VarC 22 1749 683
34.00 201 1 2 2 VarB 11 834 262
34.00 201 2 2 2 VarB 18 1266 863
34.00 202 3 2 2 VarC 6 513 317
34.00 202 4 2 2 VarC 15 899 356
34.00 203 5 2 2 VarA 7 550 261
34.00 203 6 2 2 VarA 16 1220 755

But ideally I would make those pairwise combinations by row/column and sum/merge in one go.

In another iteration I will also need to combine 101 and 201, 102 and 202, or in another iteration 101 and 102 and 103 vs 104 and 105 and 106, etc... In each instance I will then need to re randomize an appropriate number of varieties (treatments), and reps - i.e in the example output above when combining two plots I will need to assess with half the number of varieties and same number of reps, and with same number of varieties and half the reps. Then I think I can figure out the operations for applying/randomizing new names, and then group_by Plot and Week to merge with sum, cumsum, etc., but it's the wrangling part that really has me stumped.

So far I've only been able to figure out how to do this manually in excel or essentially manually in R by calling on specific plots. But there's gotta be a better way! Is there a clever way to do this without calling specific plots and basically combining stuff manually using df %>% group_by(week), but instead do it based on combinations of row and column?

I'm looking for a pipe-friendly approach here since I have to do this in lots of different combinations for different crops and locations, and then run different analyses for each layout. I'm only familiar with using r and Excel so any suggestions for either of those is welcome.

Thank you people of StackOverflow!

  • You say you "need a plot to only be combined once", but then you list plot 101 multiple times in your example. I'm sure I`m missing something here. Seeing the expected result for your sample data would be useful – cybernetic.nomad Oct 29 '21 at 19:07
  • 2
    I am sorry but I am lost. I do not understand. Maybe you could breakdown the task into more clearer steps! So for example you have this dataframe and you expect which one? – TarJae Oct 29 '21 at 19:09
  • So do you mean combine a plot x with all the others then plot y etc etc and then see which combination gives the top results? – Solar Mike Oct 29 '21 at 19:54
  • 1
    Could you provide a reproducible example of your dataset? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – william3031 Oct 30 '21 at 09:48
  • @cybernetic.nomad I updated with an example output, sorry for not including that initially, and also edited for clarity. 101 is listed multiple times as examples of different combinations/iterations. – notill_nerd Nov 02 '21 at 00:37
  • @TarJae updated with example – notill_nerd Nov 02 '21 at 00:38
  • @william3031 updated with example output – notill_nerd Nov 02 '21 at 00:38
  • @SolarMike the idea is to apply these different combinations of the basic experimental units, then apply different experimental designs (alpha vs lattice vs spatial vs RCBD etc) and see what results in the least error given this common data set. It will inform us how to set up actual variety trials and get the best bang for our buck. – notill_nerd Nov 02 '21 at 00:38
  • A reproducible dataset, something like in the article. It makes it easier for others to provide you with an answer. Also include the code that you have tried and any errors produced. – william3031 Nov 02 '21 at 03:14
  • Check Latin Square out: https://www.ndsu.edu/faculty/horsley/LS.pdf - we used it for engine power calculations. – Solar Mike Nov 02 '21 at 04:15
  • @william3031 this is an example of an actual data set I'm working with and the output I manually created, so it should be fully reproducible. But I don't have code and errors - the essence of the question is asking how to approach this problem, not troubleshooting a specific step. I can put together code to sum/merge plot 101 and 102 and rename as 101 again but that wouldn't add anything to the question because that's not what I'm trying to do. – notill_nerd Nov 02 '21 at 14:18
  • @SolarMike yes! We will be testing out Latin Square and Augmented designs as well. – notill_nerd Nov 02 '21 at 14:20
  • Have a look at the article on reproducible examples posted earlier: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example You could use something like `dput()` or `datapasta::dpasta()`. Maybe that is something for next time. Yes, I note you want to know how to approach the problem rather than troubleshooting. – william3031 Nov 02 '21 at 21:05
  • @william3031 I didn't have to use my original data per se, as that article points out, I just thought it was easier to use a snippet of that than to generate a random one with the same sort of parameters - what would be the advantage of that? I definitely want to do this the right way. Also do you have any suggestions as to approach? – notill_nerd Nov 05 '21 at 15:15
  • 1
    Read the article. It can be random data, just make it reproducible so it is easier for others. – william3031 Nov 06 '21 at 20:22

0 Answers0