I am trying to push tiny changes to an existing sparklyr
code ; these changes are meant to give the same results, only the code is supposed to be more readable and efficient. Therefore, I want to make sure I get the same results, which I have stored in a hive
table. In order to do so, I compare my new results to my old results using anti_join
:
diff.sdf <- clean_results.sdf %>%
anti_join(new_results.sdf, by = unlist(colnames(clean_results.sdf)))
I don't get a 100% match, and after looking at the details, I suspect anti-join
is not working as it should regarding when it comes to doubles. It seems that it may consider different values that are in fact not.
reproducible example (but it might be that going from spark to R and back o spark changes the situation):
structure(list(mnt_tot = 37008.16, date_analyse = "2019-01-31"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("mnt_tot",
"date_analyse"))
structure(list(mnt_tot = 37008.16, date_analyse = "2019-01-31"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("mnt_tot",
"date_analyse"))