3

Suppose I have this data frame:

 df <- data.frame(ID = c("id1", "id1", "id1", "id2", "id2", "id3", "id3", "id3"),
    Code = c("A", "B", "C", "A", "B", "A", "C", "D"),
    Count = c(34,65,21,3,8,12,15,16), Value = c(3,1,8,2,3,3,5,8))

that looks like this:

df
   ID Code Count Value
1 id1    A    34     3
2 id1    B    65     1
3 id1    C    21     8
4 id2    A     3     2
5 id2    B     8     3
6 id3    A    12     3
7 id3    C    15     5
8 id3    D    16     8

I would like to obtain this result data frame:

result <- data.frame(Code = c("A", "B", "C", "D"),
         id1_count = c(34,65,21,NA), id1_value = c(3,1,8,NA), 
         id2_count = c(3, 8, NA, NA), id2_value = c(2, 3, NA, NA), 
         id3_count = c(12,NA,15,16), id3_value = c(3,NA,5,8))

that looks like this:

> result
  Code id1_count id1_value id2_count id2_value id3_count id3_value
1    A        34         3         3         2        12         3
2    B        65         1         8         3        NA        NA
3    C        21         8        NA        NA        15         5
4    D        NA        NA        NA        NA        16         8

Is there a one liner in the R base package that can do that? I am able to achieve the result I need but not in the R way (i.e., with loops and so on). Any help is appreciated. Thank you.

Frank
  • 66,179
  • 8
  • 96
  • 180
Marius
  • 990
  • 1
  • 14
  • 34

2 Answers2

3

You can try dcast from devel version of data.table (v1.9.5) which can take multiple value.var columns. Instructions to install are here

library(data.table)
dcast(setDT(df), Code~ID, value.var=c('Count', 'Value'))
#    Code Count_id1 Count_id2 Count_id3 Value_id1 Value_id2 Value_id3
#1:    A        34         3        12         3         2         3
#2:    B        65         8        NA         1         3        NA
#3:    C        21        NA        15         8        NA         5
#4:    D        NA        NA        16        NA        NA         8

Or using reshape from base R

reshape(df, idvar='Code', timevar='ID', direction='wide')
#    Code Count.id1 Value.id1 Count.id2 Value.id2 Count.id3 Value.id3
#1    A        34         3         3         2        12         3
#2    B        65         1         8         3        NA        NA
#3    C        21         8        NA        NA        15         5
#8    D        NA        NA        NA        NA        16         8
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Yes. That did the trick. Thank you. Was not aware of that reshape function.. tried with xtabs and merge but was a bit of a pain... – Marius Jul 02 '15 at 11:52
  • 1
    akrun, perhaps you should upgrade to the latest version of 1.9.5. The order in which column names are combined are now changed... – Arun Jul 03 '15 at 08:58
  • @Arun Thanks for the info. I upgraded to latest version and edited the output. – akrun Jul 03 '15 at 12:47
1

You could also try:

library(tidyr)
library(dplyr)

df %>%
  gather(key, value, -(ID:Code)) %>%
  unite(id_key, ID, key) %>%
  spread(id_key, value)

Which gives:

#  Code id1_Count id1_Value id2_Count id2_Value id3_Count id3_Value
#1    A        34         3         3         2        12         3
#2    B        65         1         8         3        NA        NA
#3    C        21         8        NA        NA        15         5
#4    D        NA        NA        NA        NA        16         8
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77