Return all factor levels by name as new columns from a three column data.table [R]

Question

Any way to use data.table or dplyr to solve the below?

library(data.table)

(DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), 
                 c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))

Returns:

#    a b      c
# 1: A 2    bob
# 2: A 4   mary
# 3: B 5    bob
# 4: C 6 george
# 5: H 7  alice

Would like to get this:

#        alice bob george  mary 
# 1: A    NA   2    NA     NA
# 2: A    NA   NA   NA     4
# 3: B    NA   5    NA     NA
# 4: C    NA   NA   6      NA
# 5: H    7    NA   NA     NA

Probably not good practice to name anything `c`, since we use that function all the time. — Frank, May 05 '15 at 22:51
If you end up wanting a single row for each value of `a`, use `spread(DT, c, b)` from the [tidyr](http://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html) package. — Sam Firke, May 06 '15 at 01:10

score 4 · Accepted Answer · edited May 23 '17 at 11:58

4

This is similar to creating dummy variables.

uc <- sort(unique(as.character(DT$c)))
DT[,(uc):=lapply(uc,function(x)ifelse(c==x,b,NA))][,c('b','c'):=NULL]

I've heard bad things about ifelse, so a speedier route may be

uc <- sort(unique(as.character(DT$c)))
is <- 1:nrow(DT)
js <- as.character(DT$c)
vs <- DT$b

DT[,(uc):=NA_integer_]
for (i in is) set(DT,i=is[i],j=js[i],value=vs[i])

DT[,c('b','c'):=NULL]

edited May 23 '17 at 11:58

Community

1
1

answered May 05 '15 at 22:49

Frank

66,179
8
96
180

Good answer @Frank! Was able to slightly revise to drop the **b** and **c** columns: `DT[,(uc):=lapply(uc, function(x)ifelse(c==x, b, NA))][, !c("b", "c"), with=FALSE] ` – Bob Hopez May 05 '15 at 23:04
1

@BobHopez Thanks :) I forgot to drop b, but have added it now. If you drop by reference (as in the answer) by setting to `NULL` you can eliminate `b` and `c` from `DT` and don't need to make a new table. In contrast, if you do the `with=FALSE` thing, you'll need to assign it like `newDT <- DT[...][...,with=FALSE]` to use the result further, making a copy (taking time and memory). – Frank May 05 '15 at 23:08
Not able to get your bottom code chunk working @Frank . The top half works perfectly. – Bob Hopez May 05 '15 at 23:24
Getting **j is -2147483648 which is outside range [1,ncol=5]** after running for loop. – Bob Hopez May 05 '15 at 23:29
Hm, if you remove `DT`, recreate it as in your question and then run the bottom chunk, does it not work? – Frank May 05 '15 at 23:41
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/77053/discussion-between-frank-and-bob-hopez). – Frank May 05 '15 at 23:47
It took just that. Had to run `rm(DT); (DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))` prior to running the second chunk. Suggest possible edit. – Bob Hopez May 05 '15 at 23:47

bergant · Answer 2 · 2015-05-05T23:53:15.227

3

Just using the idea of dummy variables from Frank:

df1 <- cbind( a = DT$a, as.data.frame( model.matrix(a ~ c - 1, data = DT ) * DT$b ))
df1[df1==0] <- NA
names(df1) <- c("a", levels(DT$c))

#   a alice bob george mary
# 1 A    NA   2     NA   NA
# 2 A    NA  NA     NA    4
# 3 B    NA   5     NA   NA
# 4 C    NA  NA      6   NA
# 5 H     7  NA     NA   NA

edited May 05 '15 at 23:53

answered May 05 '15 at 23:45

bergant

7,122
1
20
24

Almost there... changes column names to **paste0("c", names)**. – Bob Hopez May 05 '15 at 23:52
Yes, model.matrix does that (combines the variable name with the level). I changed the names to original factor levels. – bergant May 06 '15 at 00:00
Here is a [follow-up question](http://stackoverflow.com/questions/30065418/follow-up-matching-factor-levels-in-r-with-shared-unique-row-names) – Bob Hopez May 06 '15 at 00:29

score 2 · Answer 3 · answered May 06 '15 at 00:07

2

With base R:

names <- unique(as.character(DT$c))
cbind(a = DT$a, as.data.frame(sapply(names, function(x) ifelse(DT$c==x, DT$b, NA))))

answered May 06 '15 at 00:07

Sam Firke

21,571
9
87
105

Return all factor levels by name as new columns from a three column data.table [R]

3 Answers3

Linked