0

I have a data.frame set up like this:

Group 1    Group 2   count
   a          aa        1 
   b          ab        6
   c          ac        7

I would like to reshape it to report the sum of the count by Group 1 over each of Group 2, like this:

Group 1    aa  ab  ac
   a        1   0   0 
   b        0   6   0
   c        0   0   7

I have looked into using aggregate, melt, and cast, but cannot figure a way to separate out the multiple groups into separate columns.

R: Pivoting using 'spread' function gets me to the point where I have a table with multiple rows for Group 1.

    df %>%
  group_by(`Group 1`) %>%
  mutate(id2 = sequence(n())) %>%
  spread(`Group 2`, "count")

Group 1     id2   aa   ab   ac
   a         1    1    0    0
   a         2    0    4    0
   a         3    0    0    7
   a         4    1    0    0
   a         5    1    0    0
   b         6    0    6    0
   b         7    3    0    0
   b         8    0    0    5

I still need to reduce this to sums for each Group 1 entry.

Like this:

Group 1        aa   ab   ac
   a            2    4    7
   b            3    6    5

I thought I could maybe add aggregate to the chain, but I can't get it to work on multiple columns

2 Answers2

3

You can use spread from tidyr:

library(tidyr)

spread(df, Group_2, count, fill = 0)
#> # A tibble: 3 x 4
#>   Group_1    aa    ab    ac
#>   <chr>   <dbl> <dbl> <dbl>
#> 1 a           1     0     0
#> 2 b           0     6     0
#> 3 c           0     0     7

Data

df <- tibble::tribble(~Group_1, ~Group_2,   ~count,
                      "a",          "aa",        1, 
                      "b",          "ab",        6,
                      "c",          "ac",        7)

For your second question, you can collapse the resulting data.frame using summarise_all or summarise_at:

df2 %>% 
  group_by(Group_1) %>% 
  summarise_at(c("aa", "ab", "ac"), sum)
#> # A tibble: 2 x 4
#>   Group_1    aa    ab    ac
#>   <fct>   <int> <int> <int>
#> 1 a           3     4     7
#> 2 b           3     6     5
Thomas K
  • 3,242
  • 15
  • 29
  • this gives: Error: Duplicate identifiers for rows, which is fixed with https://stackoverflow.com/questions/29952517/r-pivoting-using-spread-function. But I still need to sum all the Group 2 columns by Group 1 variables. – dysonsphere May 14 '18 at 17:11
  • @dysonsphere Yes, because it is a solution to your initial question and the data you provided, where it works. I updated my answer to resolve your new question. – Thomas K May 14 '18 at 17:25
  • using summarize_at gave me the error: "Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"" I made sure that all the columns (except the group variable) were numeric, so not sure what this error is referring to. – dysonsphere May 15 '18 at 15:24
  • @dysonsphere please edit your initial question with a short reproducible example regarding this error or ask a new question. – Thomas K May 15 '18 at 15:26
0

I found a solution here:sum multiple columns by group with tapply

Using the by function I was able to separate out the groups. Then I returned it to a data.frame with sapply:

df <- df %>%
  group_by(`Group 1`) %>%
  mutate(id2 = sequence(n())) %>%
  spread(`Group 2`, `count`)
df[is.na(df)] <-0 # transform all missing counts to zeros
df <- by(df[,c(3:5)], df$`Group 1`, FUN = colSums)
df <- t(sapply(df, I))
df <- as.data.frame(df)