0

I would like to speed up my solution in R.

I' ve got two Dataframes, let's say: df_one:

A | B | C | D | same
1 | 3 | 2 | 4 | NA
6 | 5 | 1 | 3 | NA
5 | 3 | 7 | 3 | NA
3 | 4 | 8 | 3 | NA

And df_two:

A | B 
1 | 3
6 | 2 
5 | 3 

If both the instances in column A and B are the same (or in a sequence of .5), I want a 1, otherwise an 0 in an extra column in df_one (df_one$same).

I did this with the following code:

df_one$same <- NA

for (i in 1:nrow(df_one)) {
  for (j in 1:nrow(df_two)) {
    distance <- seq(df_two[j, 2]-.5, df_two[j, 2]+.5, by = .1)
    print(i)
    if ((df_one[i, 1] == df_two[j, 1]) & (df_one[i, 2] %in% df_two[i, 2])){
      df_one[i, 5] <- 1
      break}
    else{df_one[i, 5] <- 0}
  }
}

Can anyone help me with a faster solution?

Arnand
  • 71
  • 11
  • 1
    Do a `merge`, and then compare the columns, it will be way faster than using loops. – ytk Nov 14 '16 at 15:03
  • Your code is not reproducible, and your desired behavior is unclear. `hour_df` is not defined, and it is unclear what it is you are trying to accomplish. See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more on how to write a good R question – Mark Peterson Nov 14 '16 at 15:13

1 Answers1

4

A quicker solution to what I think you are asking is to use left_join from dplyr and check explicitly for the matches.

left_join(df_one, df_two, by = "A") %>%
  mutate(same = B.x == B.y)

gives

  A B.x C D  same B.y
1 1   3 2 4  TRUE   3
2 6   5 1 3 FALSE   2
3 5   3 7 3  TRUE   3
4 3   4 8 3    NA  NA
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48