0

This question is a follow-up of this question.

Let's say I have a large data.frame, df, with columns u, v. I'd like to number the observed variable-interactions of u, v in increasing order, i.e. the order in which they were seen when traversing the data.frame from top to bottom.

Note: Assume df has some existing ordering so it's not ok to temporarily reorder it.

The code shown at the bottom of this post works well, except that the result vector returned is not in increasing order. That is, instead of the current:

# result is in decreasing order here:
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1

# but we'd like it to be in increasing order like this:
# 1 2 3 4 5 4 6 6 7 7

I've been experimenting with order(), rank(), factor(...ordered=T) etc. and nothing seems to work. I must be overlooking something obvious. Any ideas?

Note: It's also not allowed to cheat by reordering both u, v as individual factors.

set.seed(1234)
df <- data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T))
#    u v
# 1  1 3
# 2  2 3
# 3  2 2
# 4  2 4
# 5  3 2
# 6  2 4
# 7  1 2
# 8  1 2
# 9  2 1
# 10 2 1

(df$label <- factor(interaction(df$u,df$v), ordered=T))
#  [1] 1.3 2.3 2.2 2.4 3.2 2.4 1.2 1.2 2.1 2.1
# Levels: 2.1 < 1.2 < 2.2 < 3.2 < 1.3 < 2.3 < 2.4

# This is ok except want increasing-order
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1

# no better.    
match(df$label, levels(df$label)[rank(levels(df$label))])
# [1] 6 7 1 4 3 4 5 5 2 2
Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146
  • It's already there: `# ...but we want a result vector in increasing-order like this: 1 2 3 4 5 4 6 6 7 7` – smci Apr 12 '14 at 09:13
  • The numbering of the output factor levels from `interaction` is arbitrary. Instead of calling it '1 2 3 4 5 4 6 6 7 7', consider it to be E,F,C,G,D,G,B,B,A,A. The only thing that matters is that E was seen first (=> 1), F was second (=> 2) and so on. So our vector from df$label just needs to be renumbered (somehow) to 1 2 3 4 5 4 6 6 7 7 . I hope I'm being clear :S – smci Apr 12 '14 at 09:20
  • Anyway Murphy's Law says I bang my head off it for ages, then no sooner do I post it as a question and I stumble across the answer (below). – smci Apr 12 '14 at 09:21

1 Answers1

0

Duh! The solution is to add interaction(... drop=T). I still don't fully understand why not having that breaks things though.

# The original factor from interaction() had unused levels...
str(df$label)
# Factor w/ 12 levels "1.1","1.2","1.3",..: 3 7 6 8 10 8 2 2 5 5

# SOLUTION
df$label <- interaction(df$u,df$v, drop=T)

str(df$label)
# Factor w/ 7 levels "2.1","1.2","2.2",..: 5 6 3 7 4 7 2 2 1 1

rank(unique(df$label))
# [1] 5 6 3 7 4 2 1

We will use that rank (shown above) to reorder the levels in-order-observed, before matching our vector against them as follows:

# And now we get the desired result
match(df$label, levels(df$label)[ rank(unique(df$label)) ] )
# [1] 1 2 3 4 5 4 6 6 7 7
Arun
  • 116,683
  • 26
  • 284
  • 387
smci
  • 32,567
  • 20
  • 113
  • 146