Create columns from factors and count

Question

A seemingly easy problem is keeping me very busy.

I have a data frame:

> df1
  Name Score
1  Ben     1
2  Ben     2
3 John     1
4 John     2
5 John     3

I would like to create a summary of the table like this:

> df2
  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

So df2 must (i) only show unique "Names" and (ii) create columns from the unique factors in "Score" and (iii) count the number of times a person received said score.

I have tried:

df2 <- ddply(df1, c("Name"), summarise
          ,Score_1 = sum(df1$Score == 1)
          ,Score_2 = sum(df1$Score == 2)
          ,Score_3 = sum(df1$Score == 3))

which produces:

  Name Score_1 Score_2 Score_3
1  Ben       2       2       1
2 John       2       2       1

So my attempt incorrectly counts all occurences instead of counting "per group"

EDIT: As per the comments, also tried reshape (possibly just doing it wrong):

> reshape(df1, idvar = "Name", timevar = "Score", direction = "wide")
  Name
1  Ben
3 John

For a start, the "Score" column is missing but worse than that, from my research on reshape, I am not convinced that I am going to get a count of each factor, which is the whole point.

Please search on SO how to convert from long to wide format. — , Feb 01 '16 at 09:05
With `reshape2` (or `data.table`): `dcast(df1, Name ~ paste("Score", Score, sep="_"), fun.aggregate = length)` should give the result you need. — Jaap, Feb 01 '16 at 09:27
Thanks @Jaap. This elegantly answers the question. FWIW, the question which is point to in the duplicate flag `does not` answer my question, so based on that it is not a duplicate and should perhaps get an official answer for future use. — gmarais, Feb 01 '16 at 09:33
@gmarais Though, it is not an exact dupe, it will give you some ideas — akrun, Feb 01 '16 at 09:48
what about `df1$one <- 1; xtabs(one ~ Name + Score, data=df1)` ? — jogo, Feb 01 '16 at 09:48
A related post [Reshape three column data frame to matrix](http://stackoverflow.com/q/9617348). — Bhargav Rao, Feb 01 '16 at 10:13
@Jaap has provided the more efficient solution for this problem while @akrun has provided a plyr-specific solution. I prefer `plyr` usually but it has its speed issues, with no exception here! I suggest using `reshape2::dcast` here as it is lightning fast. Thanks for all the suggetions — gmarais, Feb 01 '16 at 11:05
I might be mistaken, but i think the first part in my answer is quite `plyr`-specific ;-) — Jaap, Feb 01 '16 at 11:09
@gmarais The `dplyr/tidyr` package is not lagging behind in speed to data.table . But, you are right that `plyr` has speed issues.. — akrun, Feb 01 '16 at 11:19
@Jaap, :) agreed but I like your `reshape2` better. @akrun, I tested `ddply` and `reshape2` on 31,317 rows: `reshape2` took 0.11s and `ddply` took 9.47s on the same data. I can't post the code here for you to scrutinise, can I inbox you or something else? — gmarais, Feb 01 '16 at 11:34
`data.table` is highly probably even faster than `reshape2` (and you can use the same syntax as with `reshape2`) — Jaap, Feb 01 '16 at 11:41

score 3 · Accepted Answer · edited May 23 '17 at 11:53

You only need to make some slight modification to your code. You should use .(Name) instead of c("Name"):

ddply(df1, .(Name), summarise,
      Score_1 = sum(Score == 1),
      Score_2 = sum(Score == 2),
      Score_3 = sum(Score == 3))

gives:

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

Other possibilities include:

1. table(df1) as @alexis_laz mentioned in the comments, this gives:

> table(df1)
       Score
Name   1 2 3
  Ben  1 1 0
  John 1 1 1

2. The dcast function of the reshape2 package (or data.table which has the same dcast function):

library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length)

gives:

  Name Score_1 Score_2 Score_3
1  Ben       1       1       0
2 John       1       1       1

Seems the more efficient approach in `data.table` is to do `df1[ , .N, by = .(Name, Score = paste0("Score_", Score))]` (based on brief investigation [here](http://stackoverflow.com/a/35156328/3576984)) — MichaelChirico, Feb 02 '16 at 18:03

score 2 · Answer 2 · answered Feb 01 '16 at 10:07

We can use dplyr/tidyr

 library(dplyr)
 library(tidyr)
 df1 %>% 
     group_by(Name) %>%
      mutate(n=1, Score= paste('Score', Score, sep='_')) %>% 
      spread(Score, n, fill=0) 
 #     Name Score_1 Score_2 Score_3
 #  (chr)   (dbl)   (dbl)   (dbl)
 #1   Ben       1       1       0
 #2  John       1       1       1

Create columns from factors and count

2 Answers2

Linked

Related