I was following this post R - sum value and merge column based on condition, but it only works if you have one column of values that you want to merge.
In my case, I have the following dataframe and I am unable to figure out how to merge all the values in each row for EACH sample (without having to write out each sample name, I have about 80 samples) given a condition of a column.
Here is a sample of my dataframe:
df <- data.frame(OTU_ID = c(123,456,789,101,102,232,221),
Kingdom = rep(c("Viridiplantae"),7),
Phylum = rep(c("Streptophyta"),7),
Class = rep(c("Magnoliopsida"),7),
Order = c("Asterales","Asterales","Fabales","Fabales","Dipsacales","Asterales","Fabales"),
Family = c("Asteraceae","Asteraceae","Fabaceae","Fabaceae","Caprifoliaceae","Asteraceae","Fabaceae"),
Genus = c("Gymnanthemum","Gymnanthemum","Trifolium","Trifolium","Hypericum","Felicia","Trifolium"),
Species = c("amygdalinum","amygdalinum","pannonicum","pratense","perforatum","heterophylla","pannonicum"),
Sample1 = c(436,0,0,1167,37704,0,0),
Sample2 = c(1146,0,0,285,38489,0,0),
Sample3 = c(19547,0,0,87,13732,0,0),
Sample4 = c(564,0,0,0,34821,0,0),
Sample5 = c(579,0,0,0,0,17632,0),
Sample6 = c(0,366,50,0,0,30457,100))
This is what the dataframe looks like:
OTU_ID Kingdom Phylum Class Order Family Genus Species Sample1 Sample2 Sample3
1 123 Viridiplantae Streptophyta Magnoliopsida Asterales Asteraceae Gymnanthemum amygdalinum 436 1146 19547
2 456 Viridiplantae Streptophyta Magnoliopsida Asterales Asteraceae Gymnanthemum amygdalinum 0 0 0
3 789 Viridiplantae Streptophyta Magnoliopsida Fabales Fabaceae Trifolium pannonicum 0 0 0
4 101 Viridiplantae Streptophyta Magnoliopsida Fabales Fabaceae Trifolium pratense 1167 285 87
5 102 Viridiplantae Streptophyta Magnoliopsida Dipsacales Caprifoliaceae Hypericum perforatum 37704 38489 13732
6 232 Viridiplantae Streptophyta Magnoliopsida Asterales Asteraceae Felicia heterophylla 0 0 0
7 221 Viridiplantae Streptophyta Magnoliopsida Fabales Fabaceae Trifolium pannonicum 0 0 0
Sample4 Sample5 Sample6
1 564 579 0
2 0 0 366
3 0 0 50
4 0 0 0
5 34821 0 0
6 0 17632 30457
7 0 0 100
I want to merge all unique genera or family so that I end up with a new dataframe with merged raw read counts for each unique genus or family.
My expected outcome would be a dataframe like the following:
Kingdom Phylum Class Order Family Genus Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
Viridiplantae Streptophyta Magnoliopsida Asterales Asteraceae Gymnanthemum 436 1146 19547 564 579 366
Viridiplantae Streptophyta Magnoliopsida Fabales Fabaceae Trifolium 1167 285 87 0 0 150
Viridiplantae Streptophyta Magnoliopsida Dipsacales Caprifoliaceae Hypericum 37704 38489 13732 34821 0 0
Viridiplantae Streptophyta Magnoliopsida Asterales Asteraceae Felicia 0 0 0 0 17632 30457
The final dataframe, if I chose to combine at genus level, would just end up with 4 rows where all Trifolium sp. and Gymnanthemum sp. read counts were summed up for each sample. Even though Trifolium has 2 unique species, it shouldn't matter because the condition is at genus level. If I had chosen Family level, Gymnanthemum and Felicia read counts should be summed up instead across all samples.
And then, if I wanted to, I could make a separate one at family level.