0

In follow-up to this post. How do we use the dplyr or data.table package to match the factor levels appropriately with shared row names?

library(data.table)

(DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), 
                 c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))

#    a b      c
# 1: A 2    bob
# 2: A 4   mary
# 3: B 5    bob
# 4: C 6 george
# 5: H 7  alice

...and using @frank 's great answer:

uc <- sort(unique(as.character(DT$c)))
( DT[,(uc):=lapply(uc,function(x)ifelse(c==x,b,NA))][,c('b','c'):=NULL] )

Returns:

#   a alice bob george mary
# 1 A    NA   2     NA   NA
# 2 A    NA  NA     NA    4
# 3 B    NA   5     NA   NA
# 4 C    NA  NA      6   NA
# 5 H     7  NA     NA   NA

And the final question here is, how do we get the below output, where unique row names share level values returning NAs where empty elements remain?

       alice bob george mary
# 1 A    NA   2      NA    4
# 2 B    NA   5      NA   NA
# 3 C    NA   NA      6   NA
# 4 H     7   NA     NA   NA
Community
  • 1
  • 1
Bob Hopez
  • 773
  • 4
  • 10
  • 28
  • 1
    if you wanted this result the whole time, you could have asked for it up front in your other question - `spread` is built for this, but instead you got answers that assumed you wanted the extra rows... – Sam Firke May 06 '15 at 00:43
  • If you ever saw observations `A,2,bob` and `A,3,bob`, you'd be in trouble here. Is your data such that columns `a` and `b` are a key (with each combo appearing at most once)? – Frank May 06 '15 at 00:44
  • @Sam-Firke This is what I was aiming at intuitively, but noticed after the fact that I had left out a better part of the point. If you add a note to your other response with this as the outcome, I'll upvote it! – Bob Hopez May 06 '15 at 00:46
  • @Frank Not sure what you mean by "a and b are a key". The answer I was seeking related to having an arbitrary (i.e, >= 1 instance of each level to create unique columns) – Bob Hopez May 06 '15 at 00:52
  • Oh, sorry, I meant simply is it possible that you could have observations with `A,2,bob` and `A,3,bob`? If so, the value in the newly generated columns is indeterminate for row `A` column `bob` in the result. – Frank May 06 '15 at 00:55

2 Answers2

4

With tidyr:

library(tidyr)
spread(DT, c, b)
bergant
  • 7,122
  • 1
  • 20
  • 24
2

Here's an adaptation of the answer to the last question:

DT[,b:=as.integer(b)] # it's troublesome otherwise

uc <- sort(unique(as.character(DT$c)))
DT[,lapply(setNames(uc,uc),function(x)ifelse(x%in%c,b,NA_integer_)),by=a]

Result:

   a alice bob george mary
1: A    NA   2     NA    2
2: B    NA   5     NA   NA
3: C    NA  NA      6   NA
4: H     7  NA     NA   NA
Frank
  • 66,179
  • 8
  • 96
  • 180