-3

I have my data in the below table structure:

   
Person ID | Role | Role Count
-----------------------------
1         | A    | 24
1         | B    | 3
2         | A    | 15 
2         | B    | 4
2         | C    | 7

I would like to reshape this so that there is one row for each Person ID, A column for each distinct role (e.g. A,B,C) and then the Role Count for each person as the values. Using the above data the output would be:

   
Person ID | Role A | Role B | Role C
-------------------------------------
1         |    24  |   3    |   0
2         |    16  |   4    |   7

Coming from a Java background I would take an iterative approach to this:

  1. Find all distinct values for Role
  2. Create a new table with a column for PersonID and each of the distinct roles
  3. Iterate through the first table, get role counts for each Person ID and Role combination and insert results into new table.

Is there another way of doing this in R without iterating through the first table?

Thanks

JimS
  • 1,123
  • 3
  • 14
  • 17

3 Answers3

2

Try:

library(tidyr)
df %>% spread(Role, `Role Count`)

To make the column names exactly as per your example:

df2 <- df %>% spread(Role, `Role Count`)
names(df2) <- paste('Role', names(df2))
Jacob
  • 3,437
  • 3
  • 18
  • 31
0

Try this:

library(reshape2)
df <- dcast(df, PersonID~Role, value.var='RoleCount')
df[is.na(df)] <- 0
names(df)[-1] <- paste('Role', names(df[-1]))
df

  PersonID     Role A     Role B     Role C    
1        1         24          3          0
2        2         15          4          7
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
0

With spread from tidyr

library(tidyr)
spread(data, Role, `Role Count`, sep = " ")
Julien Navarre
  • 7,653
  • 3
  • 42
  • 69