4

Any way to use data.table or dplyr to solve the below?

library(data.table)

(DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), 
                 c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))

Returns:

#    a b      c
# 1: A 2    bob
# 2: A 4   mary
# 3: B 5    bob
# 4: C 6 george
# 5: H 7  alice

Would like to get this:

#        alice bob george  mary 
# 1: A    NA   2    NA     NA
# 2: A    NA   NA   NA     4
# 3: B    NA   5    NA     NA
# 4: C    NA   NA   6      NA
# 5: H    7    NA   NA     NA
Bob Hopez
  • 773
  • 4
  • 10
  • 28

3 Answers3

4

This is similar to creating dummy variables.

uc <- sort(unique(as.character(DT$c)))
DT[,(uc):=lapply(uc,function(x)ifelse(c==x,b,NA))][,c('b','c'):=NULL]

I've heard bad things about ifelse, so a speedier route may be

uc <- sort(unique(as.character(DT$c)))
is <- 1:nrow(DT)
js <- as.character(DT$c)
vs <- DT$b

DT[,(uc):=NA_integer_]
for (i in is) set(DT,i=is[i],j=js[i],value=vs[i])

DT[,c('b','c'):=NULL]
Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
  • Good answer @Frank! Was able to slightly revise to drop the **b** and **c** columns: `DT[,(uc):=lapply(uc, function(x)ifelse(c==x, b, NA))][, !c("b", "c"), with=FALSE] ` – Bob Hopez May 05 '15 at 23:04
  • 1
    @BobHopez Thanks :) I forgot to drop b, but have added it now. If you drop by reference (as in the answer) by setting to `NULL` you can eliminate `b` and `c` from `DT` and don't need to make a new table. In contrast, if you do the `with=FALSE` thing, you'll need to assign it like `newDT <- DT[...][...,with=FALSE]` to use the result further, making a copy (taking time and memory). – Frank May 05 '15 at 23:08
  • Not able to get your bottom code chunk working @Frank . The top half works perfectly. – Bob Hopez May 05 '15 at 23:24
  • Getting **j is -2147483648 which is outside range [1,ncol=5]** after running for loop. – Bob Hopez May 05 '15 at 23:29
  • Hm, if you remove `DT`, recreate it as in your question and then run the bottom chunk, does it not work? – Frank May 05 '15 at 23:41
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/77053/discussion-between-frank-and-bob-hopez). – Frank May 05 '15 at 23:47
  • It took just that. Had to run `rm(DT); (DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))` prior to running the second chunk. Suggest possible edit. – Bob Hopez May 05 '15 at 23:47
3

Just using the idea of dummy variables from Frank:

df1 <- cbind( a = DT$a, as.data.frame( model.matrix(a ~ c - 1, data = DT ) * DT$b ))
df1[df1==0] <- NA
names(df1) <- c("a", levels(DT$c))

#   a alice bob george mary
# 1 A    NA   2     NA   NA
# 2 A    NA  NA     NA    4
# 3 B    NA   5     NA   NA
# 4 C    NA  NA      6   NA
# 5 H     7  NA     NA   NA
bergant
  • 7,122
  • 1
  • 20
  • 24
  • Almost there... changes column names to **paste0("c", names)**. – Bob Hopez May 05 '15 at 23:52
  • Yes, model.matrix does that (combines the variable name with the level). I changed the names to original factor levels. – bergant May 06 '15 at 00:00
  • Here is a [follow-up question](http://stackoverflow.com/questions/30065418/follow-up-matching-factor-levels-in-r-with-shared-unique-row-names) – Bob Hopez May 06 '15 at 00:29
2

With base R:

names <- unique(as.character(DT$c))
cbind(a = DT$a, as.data.frame(sapply(names, function(x) ifelse(DT$c==x, DT$b, NA))))
Sam Firke
  • 21,571
  • 9
  • 87
  • 105