4

Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.

I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.

So I have this:

state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)

s1
s2

which returns this:

> s1
  state value1
1    IA      1
2    IA      2
3    IA      3
4    IL      4
5    IL      5
6    IL      6
> s2
  state value2
1    IL      3
2    IL      4
3    IL      5
4    IA      6
5    IA      7
6    IA      8

and I want this:

  state value1 value2
1    IA      1      6
2    IA      2      7
3    IA      3      8
4    IL      4      3
5    IL      5      4
6    IL      6      5

I'm about to drive myself silly trying to solve this. Seems like it should be a simple subscript problem.

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • 1
    How toy is this example? FOr instance, in this case the state IDs always match up with the same value1 (IA == 2, IL == 1) so it doesn't matter exactly which order they're in as long as the state IDs are sorted in the same was as in s1. If value1 was say 1:6, that's different altogether. – geoffjentry Aug 19 '09 at 18:54
  • Very good point. I over simplified my example. IA does not always ==2 so that makes it too simple. I'll edit my question. – JD Long Aug 19 '09 at 19:36

2 Answers2

3

There are several ways to do this (it is R, after all) but I think the most clear is creating an index. We need a function that creates a sequential index (starting at one and ending with the number of observations).

seq_len(3) 
> [1] 1 2 3

But we need to calculate this index within each grouping variable (state). For this we can use R's ave function. It takes a numeric as the first argument, then the grouping factors, and finally the function to be applied in each group.

s1$index <- with(s1,ave(value1,state,FUN=seq_len))
s2$index <- with(s2,ave(value2,state,FUN=seq_len))

(Note the use of with, which tells R to search for the variables within the environment/dataframe. This is better practice than using s1$value1, s2$value2, etc.)

Now we can simply merge (join) the two data frames (by the variables present in the both data frames: state and index).

merge(s1,s2)

which gives

   state index value1 value2
1    IA     1      1      6
2    IA     2      2      7
3    IA     3      3      8
4    IL     1      4      3
5    IL     2      5      4
6    IL     3      6      5

For this to work, there should be the same number of observations by state in each of the data frames.

[Edit: commented the code for clarity.] [Edit: Used seq_len instead of creating a new function as suggested by hadley.]

Eduardo Leoni
  • 8,991
  • 6
  • 42
  • 49
0

NOTE: Check the 5th comment on the answer above. Solution should be

s1$index <- with(s1,ave(value1,state,FUN=seq_along))
s2$index <- with(s2,ave(value2,state,FUN=seq_along))

Tested and working.

nzcoops
  • 9,132
  • 8
  • 41
  • 52