How to sum and concatenate cells based on multiple value ranges

Question

Years of lurking and benefiting from community knowledge here, but my first time posting, so thanks in advance. This question is distinct from the answers I've been able to find that focus on sum or concatenate by groups/factors. I have a data set from a vegetable field uniformity trial. I need to look at the power of different experimental designs and analyses approaches (so what I am actually interested in is the error resulting from different approaches to this same data set). I have a basic experimental unit of a single plot, and I need to combine those plots in different combinations, i.e. two adjacent plots in the same row or column, the entirety of a single row, 4 plots in an adjacent row/column combo... The tricky part here is that I need a plot to only be combined once, so it can't be +/- 1.

So for instance, a data frame might look like this, with many additional locations, crops, rows and weeks of harvest data:

Week	Plot	Row	Column	Rep	Variety	Market_Ct	Market_Wt	Unmark_Wt
34.00	101	1	1	1	VarB	15	1174	671
34.00	102	1	2	1	VarA	32	2450	136
34.00	103	1	3	1	VarD	3	234	127
34.00	104	1	4	1	VarE	5	440	657
34.00	105	1	5	1	VarC	11	882	430
34.00	106	1	6	1	VarF	22	1749	683
34.00	201	2	1	2	VarE	11	834	262
34.00	202	2	2	2	VarF	18	1266	863
34.00	203	2	3	2	VarA	6	513	317
34.00	204	2	4	2	VarC	15	899	356
34.00	205	2	5	2	VarB	7	550	261
34.00	206	2	6	2	VarD	16	1220	755

As an example, I will need to combine plot 101 and 102, 103 and 104, etc and sum the yield data. The initial output might look like the following if you're stupid like me and can't figure out how to merge/sum adjacent plots in one go:

Week	Plot	Column	Row	Rep	Variety	Market_Ct	Market_Wt	Unmark_Wt
34.00	101	1	1	1	VarA	15	1174	671
34.00	101	2	1	1	VarA	32	2450	136
34.00	102	3	1	1	VarB	3	234	127
34.00	102	4	1	1	VarB	5	440	657
34.00	103	5	1	1	VarC	11	882	430
34.00	103	6	1	1	VarC	22	1749	683
34.00	201	1	2	2	VarB	11	834	262
34.00	201	2	2	2	VarB	18	1266	863
34.00	202	3	2	2	VarC	6	513	317
34.00	202	4	2	2	VarC	15	899	356
34.00	203	5	2	2	VarA	7	550	261
34.00	203	6	2	2	VarA	16	1220	755

But ideally I would make those pairwise combinations by row/column and sum/merge in one go.

In another iteration I will also need to combine 101 and 201, 102 and 202, or in another iteration 101 and 102 and 103 vs 104 and 105 and 106, etc... In each instance I will then need to re randomize an appropriate number of varieties (treatments), and reps - i.e in the example output above when combining two plots I will need to assess with half the number of varieties and same number of reps, and with same number of varieties and half the reps. Then I think I can figure out the operations for applying/randomizing new names, and then group_by Plot and Week to merge with sum, cumsum, etc., but it's the wrangling part that really has me stumped.

So far I've only been able to figure out how to do this manually in excel or essentially manually in R by calling on specific plots. But there's gotta be a better way! Is there a clever way to do this without calling specific plots and basically combining stuff manually using df %>% group_by(week), but instead do it based on combinations of row and column?

I'm looking for a pipe-friendly approach here since I have to do this in lots of different combinations for different crops and locations, and then run different analyses for each layout. I'm only familiar with using r and Excel so any suggestions for either of those is welcome.

Thank you people of StackOverflow!

You say you "need a plot to only be combined once", but then you list plot 101 multiple times in your example. I'm sure I`m missing something here. Seeing the expected result for your sample data would be useful — cybernetic.nomad, Oct 29 '21 at 19:07
I am sorry but I am lost. I do not understand. Maybe you could breakdown the task into more clearer steps! So for example you have this dataframe and you expect which one? — TarJae, Oct 29 '21 at 19:09
So do you mean combine a plot x with all the others then plot y etc etc and then see which combination gives the top results? — Solar Mike, Oct 29 '21 at 19:54
Could you provide a reproducible example of your dataset? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — william3031, Oct 30 '21 at 09:48
@cybernetic.nomad I updated with an example output, sorry for not including that initially, and also edited for clarity. 101 is listed multiple times as examples of different combinations/iterations. — notill_nerd, Nov 02 '21 at 00:37
@SolarMike the idea is to apply these different combinations of the basic experimental units, then apply different experimental designs (alpha vs lattice vs spatial vs RCBD etc) and see what results in the least error given this common data set. It will inform us how to set up actual variety trials and get the best bang for our buck. — notill_nerd, Nov 02 '21 at 00:38
A reproducible dataset, something like in the article. It makes it easier for others to provide you with an answer. Also include the code that you have tried and any errors produced. — william3031, Nov 02 '21 at 03:14
Check Latin Square out: https://www.ndsu.edu/faculty/horsley/LS.pdf - we used it for engine power calculations. — Solar Mike, Nov 02 '21 at 04:15
@william3031 this is an example of an actual data set I'm working with and the output I manually created, so it should be fully reproducible. But I don't have code and errors - the essence of the question is asking how to approach this problem, not troubleshooting a specific step. I can put together code to sum/merge plot 101 and 102 and rename as 101 again but that wouldn't add anything to the question because that's not what I'm trying to do. — notill_nerd, Nov 02 '21 at 14:18
@SolarMike yes! We will be testing out Latin Square and Augmented designs as well. — notill_nerd, Nov 02 '21 at 14:20
Have a look at the article on reproducible examples posted earlier: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example You could use something like `dput()` or `datapasta::dpasta()`. Maybe that is something for next time. Yes, I note you want to know how to approach the problem rather than troubleshooting. — william3031, Nov 02 '21 at 21:05
@william3031 I didn't have to use my original data per se, as that article points out, I just thought it was easier to use a snippet of that than to generate a random one with the same sort of parameters - what would be the advantage of that? I definitely want to do this the right way. Also do you have any suggestions as to approach? — notill_nerd, Nov 05 '21 at 15:15
Read the article. It can be random data, just make it reproducible so it is easier for others. — william3031, Nov 06 '21 at 20:22

How to sum and concatenate cells based on multiple value ranges

0 Answers0