I need to fuzzy match and get the distance between the zip / address inin two distint dataset.
Here below an example:
name_a <- c("Aldo", "Andrea", "Alberto", "Antonio", "Angelo")
name_b <- c("Sara", "Serena", "Silvia", "Sonia", "Sissi")
zip_street_a <- c("1204 Roma Street 8", "1204 Roma Street 8", "1204 Roma Street 8", "1204 Venezia street 10", "1204 Venezia Street 110")
zip_street_b <- c("1204 Roma Street 81", "1204 Roma Street 8A", "1204 Roma Street 8B", "1204 Roma Street 8C", "1204 Venezia Street 10C")
db_a <- data.frame(name_a, zip_street_a)
db_b <- data.frame(name_b, zip_street_b)
names(db_a)[names(db_a)=='zip_street_a'] <- 'zipstreet'
names(db_b)[names(db_b)=='zip_street_b'] <- 'zipstreet'
Now I used library(fuzzyjoin)
in combinatin with library(dplyr)
to create the following script:
match_data <- stringdist_left_join(db_a, db_b,
by = "zipstreet",
ignore_case = TRUE,
method = "jaccard",
max_dist = 1,
distance_col = "dist"
) %>%
Group_by(zipstreet.x)
The script works fine. But I would like to have different distance between the following address combinations:
a) 1204 Roma Street 8 vs. 1204 Roma Street 81 --> distance = 0.0147
b) 1204 Roma Street 8 vs. 1204 Roma Street 8A --> distance = 0.0147
Now, Roma Street number 81 is very far from Roma Street 8. On the other hand, Roma Street number 8A is very close to Roma Street number 8.
So, I need to have a distance very close to 0 for 8A, and far from 0 for 81.
How is it possibile to do that?