1

I'm using R and I'm really at a loss right now. I have data like this:

df <- data.frame(
       group = c(2, 2, 2, 1, 1, 0, 0, 1, 1, 0, 1, 0),
       grade = c(2, 4, 3, 1, 3, 2, 5, 1, 1, 2, 3, 1)
)

I want to have it like this:

  group0  group1  group2
1      1       3       0
2      2       0       1
3      0       2       1
4      0       0       1
5      1       0       0
6      0       0       0

I've been trying for hours using subset, tapply, table, for loops and what not but I can't seem to figure it out. I'd be really happy if someone could help me, I can't help but think I'm missing something really easy and obvious.

How can I produce my target output?

/ Solved, see below. Thanks for finding a fitting title btw, you guys are the best!

Leo
  • 31
  • 5

2 Answers2

1

You can do something like this with dplyr and tidyr:

df %>%
  count(group, grade) %>%
  mutate(group = paste0('group', group)) %>%
  spread(group, n, fill = 0)

# A tibble: 5 x 4
  grade group0 group1 group2
* <int>  <dbl>  <dbl>  <dbl>
1     1      1      3      0
2     2      2      0      1
3     3      0      2      1
4     4      0      0      1
5     5      1      0      0

If you don't want the additional 'grade' column, you can do:

df %>%
  count(group, grade) %>%
  mutate(group = paste0('group', group)) %>%
  spread(group, n, fill = 0) %>%
  select(-grade)

  group0 group1 group2
*  <dbl>  <dbl>  <dbl>
1      1      3      0
2      2      0      1
3      0      2      1
4      0      0      1
5      1      0      0
Gopala
  • 10,363
  • 7
  • 45
  • 77
0

Alternatively, consider a base R approach using: by for grouping, aggregate for counts, setNames for group## column names, and Reduce for chain merge of dataframes:

# DATAFRAME LIST BY EACH GROUP
grp_list <- by(df, df$group, function(d) setNames(aggregate(.~grade, d, FUN=length), 
                                                 c("grade", paste0("group",max(d$group)))))

# CHAIN MERGE (OUTER JOIN)
final_df <- Reduce(function(x,y) merge(x,y, by="grade", all=TRUE), grp_list) 

# FILL NA WITH ZEROS
final_df[is.na(final_df)] <- 0

final_df
#   grade group0 group1 group2
# 1     1      1      3      0
# 2     2      2      0      1
# 3     3      0      2      1
# 4     4      0      0      1
# 5     5      1      0      0

And to remove grade, use transform after chain merge or directly on final_df:

final_df <- transform(Reduce(function(x,y) merge(x,y, by="grade", all=TRUE), grp_list), 
                      grade = NULL)

final_df <- transform(final_df, grade = NULL)
Parfait
  • 104,375
  • 17
  • 94
  • 125