How do I combine rows of data based on values of other variables in R?

Question

I am trying to combine rows of data based on levels of other variables I have attached a sample of my data below.

data <- structure(list(FishID = c("SSS012", "SSS012", "SSS012", "SSS014", 
"SSS014", "SSS014", "SSS24", "SSS24", "SSS24", "SSS24", "SSS24"
), Taxa = c("Krill", "Onisimus", "Onisimus", "Krill", "Krill", 
"Onisimus", "Copepods", "Onisimus", "Themisto", "Unidentified Fish", 
"Unidentified Fish"), EstimatedNumber = c(2L, 6L, 1L, 2L, NA, 
6L, 16L, 4L, 389L, 80L, 1L), TotalMass = c(0.074, 0.143, 0.052, 
0.034, 5.342, 0.16, 0.09, 0.087, 28.742, 6.556, 0.782), Comments = c("", 
"", "", "", "", "", "", "", "", "", "will likely change taxa to fish"
), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 
2019L, 2019L, 2019L, 2019L), PA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1)), row.names = c(487L, 488L, 489L, 512L, 513L, 514L, 628L, 
634L, 636L, 638L, 639L), class = "data.frame")

If we run table(data$FishID, data$Taxa) we can see that some taxa occur twice, while the other Taxa only occur once. I would like to make sure that each taxa only appears once per FishID. However, I would like to conserve the estimated number and total mass data from both rows (i.e., for FishID SSS012, I want one row for Onisimus with a value of 7 for estimated number and 0.095 for total mass in addition to the row for krill).

Lots of similar posts. You can look at `data.table` or`dplyr` libraries or `aggregate` in base R. [Group by multiple columns and sum other multiple columns](https://stackoverflow.com/questions/8212699/group-by-multiple-columns-and-sum-other-multiple-columns) [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-e-g-sum-mean) — caldwellst, Feb 01 '22 at 21:55

score 1 · Accepted Answer · answered Feb 01 '22 at 21:59

1

Here is a potential solution using dplyr:

library(dplyr)

data %>% 
  group_by(FishID, Taxa) %>% 
  summarize(across(EstimatedNumber:TotalMass, ~sum(.)))

Which gives us:

  FishID Taxa              EstimatedNumber TotalMass
  <chr>  <chr>                       <int>     <dbl>
1 SSS012 Krill                           2     0.074
2 SSS012 Onisimus                        7     0.195
3 SSS014 Krill                          NA     5.38 
4 SSS014 Onisimus                        6     0.16 
5 SSS24  Copepods                       16     0.09 
6 SSS24  Onisimus                        4     0.087
7 SSS24  Themisto                      389    28.7  
8 SSS24  Unidentified Fish              81     7.34

answered Feb 01 '22 at 21:59

Matt

7,255
2
12
34

This looks like it should work.. What package is the across() function from? I don't seem to have it. – ljh2001 Feb 01 '22 at 22:02
`across()` is from `dplyr` – Matt Feb 01 '22 at 22:05
Gotcha, I think I have to update my package version, thanks! – ljh2001 Feb 01 '22 at 22:08

How do I combine rows of data based on values of other variables in R?

1 Answers1