0

I am trying to expand on the answer to this problem that was solved, Take Sum of a Variable if Combination of Values in Two Other Columns are Unique but because I am new to stack overflow, I can't comment directly on that post so here is my problem:

I have a dataset like the following but with about 100 columns of binary data as shown in "ani1" and "bni2" columns.

Locations <- c("A","A","A","A","B","B","C","C","D", "D","D")
seasons <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
ani1 <- c(1,1,1,1,0,1,1,1,0,1,0)
bni2 <- c(0,0,1,1,1,1,0,1,0,1,1)

df <- data.frame(Locations, seasons, ani1, bni2)

     Locations seasons ani1 bni2
1          A       2    1    0
2          A       2    1    0
3          A       3    1    1
4          A       4    1    1
5          B       2    0    1
6          B       3    1    1
7          C       1    1    0
8          C       2    1    1
9          D       2    0    0
10         D       4    1    1
11         D       4    0    1

I am attempting to sum all the columns based on the location and season, but I want to simplify so I get a total column for column #3 and after for each unique combination of location and season. The problem is not all the columns have a 1 value for every combination of location and season and they all have different names.

I would like something like this:

    Locations seasons ani1 bni2
1         A       2    2    0
2         A       3    1    1
3         A       4    1    1
4         B       2    0    1
5         B       3    1    1
6         C       1    1    0
7         C       2    1    1
8         D       2    0    0
9         D       4    1    2

Here is my attempt using a for loop:

 df2 <- 0
 for(i in 3:length(df)){
  testdf <- data.frame(t(apply(df[1:2], 1, sort)), df[i])
  df2 <- aggregate(i~., testdf, FUN=sum)
 }

I get the following error:

Error in model.frame.default(formula = i ~ ., data = testdf) : 
  variable lengths differ (found for 'X1')

Thank you!

Rogue
  • 29
  • 4

1 Answers1

0

You can use dplyr::summarise and across after group_by.

library(dplyr)

df %>% 
  group_by(Locations, seasons) %>% 
  summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
  ungroup()

Another option is to reshape the data to long format using functions from the tidyr package. This avoids the issue of having to select columns 3 onwards.

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = -c(Locations, seasons)) %>% 
  group_by(Locations, seasons, name) %>% 
  summarise(Sum = sum(value, na.rm = TRUE)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = "name", values_from = "Sum")

Result:

# A tibble: 9 x 4
  Locations seasons  ani1  ani2
  <chr>       <int> <int> <int>
1 A               2     2     0
2 A               3     1     1
3 A               4     1     1
4 B               2     0     1
5 B               3     1     1
6 C               1     1     0
7 C               2     1     1
8 D               2     0     0
9 D               4     1     2
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thank you! I should have been more clear: Is there a way to do this if all the columns in column 3 and after it have different names? – Rogue Mar 22 '21 at 02:14
  • How different are the names? Do they have any features in common? – neilfws Mar 22 '21 at 02:20
  • They are all different. I added a prefix to the target columns though but now I get the following error: "Error in across(starts_with("ID_"), ~sum(.x)) : could not find function "across" " – Rogue Mar 22 '21 at 02:29
  • Sounds like you forgot `library(dplyr)` (which needs to be installed first if you haven't already). – neilfws Mar 22 '21 at 02:33
  • Added a second solution to avoid the column select issue. – neilfws Mar 22 '21 at 02:39
  • I get the following error: "Attaching package: ‘dplyr’ The following objects are masked from ‘package:stats’: filter, lag The following objects are masked from ‘package:base’: intersect, setdiff, setequal, union" – Rogue Mar 22 '21 at 03:19
  • It works after I updated my R version. Thank you – Rogue Mar 22 '21 at 03:59
  • If the answer solves the problem, please accept it. – neilfws Mar 22 '21 at 21:50