Follow-up: Matching factor levels in R with shared unique row names

Question

In follow-up to this post. How do we use the dplyr or data.table package to match the factor levels appropriately with shared row names?

library(data.table)

(DT = data.table(a = LETTERS[c(1, 1:3, 8)], b = c(2, 4:7), 
                 c = as.factor(c("bob", "mary", "bob", "george", "alice")), key="a"))

#    a b      c
# 1: A 2    bob
# 2: A 4   mary
# 3: B 5    bob
# 4: C 6 george
# 5: H 7  alice

...and using @frank 's great answer:

uc <- sort(unique(as.character(DT$c)))
( DT[,(uc):=lapply(uc,function(x)ifelse(c==x,b,NA))][,c('b','c'):=NULL] )

Returns:

#   a alice bob george mary
# 1 A    NA   2     NA   NA
# 2 A    NA  NA     NA    4
# 3 B    NA   5     NA   NA
# 4 C    NA  NA      6   NA
# 5 H     7  NA     NA   NA

And the final question here is, how do we get the below output, where unique row names share level values returning NAs where empty elements remain?

       alice bob george mary
# 1 A    NA   2      NA    4
# 2 B    NA   5      NA   NA
# 3 C    NA   NA      6   NA
# 4 H     7   NA     NA   NA

if you wanted this result the whole time, you could have asked for it up front in your other question - `spread` is built for this, but instead you got answers that assumed you wanted the extra rows... — Sam Firke, May 06 '15 at 00:43
If you ever saw observations `A,2,bob` and `A,3,bob`, you'd be in trouble here. Is your data such that columns `a` and `b` are a key (with each combo appearing at most once)? — Frank, May 06 '15 at 00:44
@Sam-Firke This is what I was aiming at intuitively, but noticed after the fact that I had left out a better part of the point. If you add a note to your other response with this as the outcome, I'll upvote it! — Bob Hopez, May 06 '15 at 00:46
@Frank Not sure what you mean by "a and b are a key". The answer I was seeking related to having an arbitrary (i.e, >= 1 instance of each level to create unique columns) — Bob Hopez, May 06 '15 at 00:52
Oh, sorry, I meant simply is it possible that you could have observations with `A,2,bob` and `A,3,bob`? If so, the value in the newly generated columns is indeterminate for row `A` column `bob` in the result. — Frank, May 06 '15 at 00:55

score 4 · Accepted Answer · answered May 06 '15 at 00:33

4

With tidyr:

library(tidyr)
spread(DT, c, b)

answered May 06 '15 at 00:33

bergant

7,122
1
20
24

With **data.table** or **dplyr**? – Bob Hopez May 06 '15 at 00:38
2

@BobHopez If you're into dplyr, my understanding is tidyr is made to go with it. – Frank May 06 '15 at 00:41
2

You can consider tidyr as a dplyr cousin. They are meant to be used in tandem. – bergant May 06 '15 at 00:41

Frank · Answer 2 · 2015-05-06T01:04:20.753

2

Here's an adaptation of the answer to the last question:

DT[,b:=as.integer(b)] # it's troublesome otherwise

uc <- sort(unique(as.character(DT$c)))
DT[,lapply(setNames(uc,uc),function(x)ifelse(x%in%c,b,NA_integer_)),by=a]

Result:

   a alice bob george mary
1: A    NA   2     NA    2
2: B    NA   5     NA   NA
3: C    NA  NA      6   NA
4: H     7  NA     NA   NA

edited May 06 '15 at 01:04

answered May 06 '15 at 00:51

Frank

66,179
8
96
180

1

Thanks for the additional support! – Bob Hopez May 06 '15 at 01:05

Follow-up: Matching factor levels in R with shared unique row names

2 Answers2

Linked