0

I have two data frames (DF1 and DF2) that I am trying to compare values in.

For each row in DF1, I want to compare column 1 with column 1 for each row in DF2. For each match (there can be numerous for each value from DF1), I want to populate another data frame (DF3) with the values from the row the where the value being compared resides, and values from the matching rows in DF2.

Below is the code I have been working on. I know I should be avoiding loops, never mind nested loops, but I can't figure out any other way or find any examples.

DF3 <- data.frame("id","DF1Timestamp","DF2Timestamp")


head(df1)
            id                 timestamp      
    1  CA8058409291009   2015-07-10 15:25:31 
    2  CA8058402571009   2015-07-10 15:50:20
    3  CA8323300811005   2015-07-10 16:18:52 
    4  CA8058402571009   2015-07-10 16:51:54                          
    5  CA8323300811005   2015-07-10 16:54:03 

head(df2)

           id           timestamp     
1   CA8058402571009 2015-07-10 15:49:23
2   CA8323300811005 2015-07-10 16:18:37 
3   CA8058402571009 2015-07-10 16:31:42
4   CA8058409291009 2015-07-10 16:34:34
5   CA8058402571009 2015-07-10 16:51:29

#sets the index used to identify row numbers in the DF3 table
index <- 1

for (i in 1:nrow(DF1)){
  for (j in 1:nrow(DF2)){
    if ((DF1[i,3]) == (DF2[j,3])) {
      DF3 [[i,1]] <- (DF1[i,3])
      DF3 [[i,2]] <- (DF1[i,2])
      DF3 [[i,3]] <- (DF2[j,3])
      index <- index +1
    }
    else{
      #do nothing
    }
  }
}

All I get for results is

head(DF3)

           ID.         DF1timestamp      DF2timestamp
1          <NA>               <NA>             <NA>
2          <NA>               <NA>             <NA>
3          <NA>               <NA>             <NA>
4          <NA>               <NA>             <NA>
5          <NA>               <NA>             <NA>
6          <NA>               <NA>             <NA>

What I wanting to get is this.

head(DF3)

        id                 DF1Timestamp          DF2Timestamp
1  CA8058409291009      2015-07-10 15:25:31    2015-07-10 16:34:34
2  CA8058402571009      2015-07-10 15:50:20    2015-07-10 16:31:42
3  CA8058402571009      2015-07-10 15:50:20    2015-07-10 16:51:54
4  CA8323300811005      2015-07-10 16:18:52    2015-07-10 16:18:37
5  CA8058402571009      2015-07-10 16:51:54    2015-07-10 15:49:23
6  CA8058402571009      2015-07-10 16:51:54    2015-07-10 16:31:42
7  CA8323300811005      2015-07-10 16:54:03    2015-07-10 16:18:37

What this table is showing is the first time and event occurred at a given ID and a row for all the possible times the event ended.

Any help is appreciated.

Terno
  • 91
  • 6
  • Please include the desired result? – Rich Scriven Aug 24 '15 at 17:07
  • It looks like `DF1` has six rows, since your result ends up with that many. It'd be a lot clearer with a reproducible example: http://stackoverflow.com/a/28481250/1191259 – Frank Aug 24 '15 at 17:10
  • 2
    It looks like you're looping for what `merge(df1, df2)` does – Pierre L Aug 24 '15 at 17:11
  • @richard, I just made the update. – Terno Aug 24 '15 at 17:23
  • @Frank, the output I am getting was just a sample using the head(). I added a table of results of what I should get with the sample DF1 and DF2 tables I provided. – Terno Aug 24 '15 at 17:26
  • @ Pierre Lafortune it's not a straight merge. Each row in DF1 can have more than one match in DF2 – Terno Aug 24 '15 at 17:29
  • Those are two column dataframes and you are matching on the third (non-existent) column and you are not matching on the ID column at all. Furthermore, your most recent comment suggests you do not understand the merge function. It creates a Cartesian product when more than one item matches across the join. You should show what the right answer would be with just the inputs you display. Don't make us guess your intent. – IRTFM Aug 24 '15 at 17:29
  • @ Pierre Lafortune you were right. I was trying to make things more complicated than they had to be. – Terno Aug 24 '15 at 17:35

1 Answers1

1

I would use merge()

df3 <- merge(df1, df2, by.x = 'id', by.y = 'id')
> df3
           id              date.x              date.y
1 CA8058402571009 2015-07-10 15:50:20 2015-07-10 15:49:23
2 CA8058402571009 2015-07-10 15:50:20 2015-07-10 16:31:42
3 CA8058402571009 2015-07-10 15:50:20 2015-07-10 16:51:29
4 CA8058402571009 2015-07-10 16:51:54 2015-07-10 15:49:23
5 CA8058402571009 2015-07-10 16:51:54 2015-07-10 16:31:42
6 CA8058402571009 2015-07-10 16:51:54 2015-07-10 16:51:29
7 CA8058409291009 2015-07-10 15:25:31 2015-07-10 16:34:34
8 CA8323300811005 2015-07-10 16:18:52 2015-07-10 16:18:37
9 CA8323300811005 2015-07-10 16:54:03 2015-07-10 16:18:37
Ken Yeoh
  • 876
  • 6
  • 11