0

I'm missing something glaringly simple and obvious on this simple left_join operation which isn't working. I have this one column data frame:

   time_seconds
          <dbl>
 1          0  
 2          0.1
 3          0.2
 4          0.3
 5          0.4
 6          0.5
 7          0.6
 8          0.7
 9          0.8
10          0.9

Which I want to left join on with some sports data (small sample of my full df but displays the problem):

  number time_seconds distance_meters pace_seconds watts cal_hr cadence heart_rate
   <dbl>        <dbl>           <dbl>        <dbl> <dbl>  <dbl>   <dbl>      <dbl>
1      1          0.7               2         144.   117    704       0          0
2      2          0.9               3         144.   117    704       0          0

However my left_join only joins the second row.

x %>% left_join(y, by = "time_seconds")
# A tibble: 10 x 8
   time_seconds number distance_meters pace_seconds watts cal_hr cadence heart_rate
          <dbl>  <dbl>           <dbl>        <dbl> <dbl>  <dbl>   <dbl>      <dbl>
 1          0       NA              NA          NA     NA     NA      NA         NA
 2          0.1     NA              NA          NA     NA     NA      NA         NA
 3          0.2     NA              NA          NA     NA     NA      NA         NA
 4          0.3     NA              NA          NA     NA     NA      NA         NA
 5          0.4     NA              NA          NA     NA     NA      NA         NA
 6          0.5     NA              NA          NA     NA     NA      NA         NA
 7          0.6     NA              NA          NA     NA     NA      NA         NA
 8          0.7     NA              NA          NA     NA     NA      NA         NA
 9          0.8     NA              NA          NA     NA     NA      NA         NA
10          0.9      2               3         144.   117    704       0          0

In my full dataset it seems quite random which rows from y are actually being joined. All rows in y have unique time_seconds values.

structure(list(number = c(1, 2), time_seconds = c(0.7, 0.9), 
    distance_meters = c(2, 3), pace_seconds = c(143.9, 143.9), 
    watts = c(117, 117), cal_hr = c(704, 704), cadence = c(0, 
    0), heart_rate = c(0, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L)) -> y

structure(list(time_seconds = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 
0.7, 0.8, 0.9)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")) -> x
Nautica
  • 2,004
  • 1
  • 12
  • 35
  • I'm not able to reproduce your problem. Perhaps try a fresh R session. – tmfmnk Apr 12 '20 at 11:23
  • I tried a new session and on my laptop as well but it didn't work. The example I posted works and I tried creating new objects with `dput()` of my two data frames and that works, however if I simply bind my original data frames with new names that doesn't work. – Nautica Apr 12 '20 at 12:01
  • I converted `time_seconds` in both dfs to character and the left_join works then, I would have to reconvert back to numeric however - I could use this as an improvised solution but it would be good to still understand why my initial df doesn't work in the join for learning purposes – Nautica Apr 12 '20 at 12:06
  • 2
    With regards to joining on double, take a look at [this](https://stackoverflow.com/questions/46487199/r-dplyr-left-join-error-missing-values-produced-when-joining-values-rounded-to) including last comment and [this link](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). – Ben Apr 12 '20 at 15:49

0 Answers0