Edit: It looks like this is a known issue with the "cascade" method. Results that return NA values after the first attempt don't like being converted to doubles when subsequent methods return lat/lons.
Data: I have a list of addresses that I need to geocode. I'm using lapply()
to split-apply-combine, which works, but very slowly. My thought to split (further)-apply-combine is returning errors about dim names and sizes that are confusing to me.
# example data
library(dplyr)
library(tidygeocoder)
url <- "https://www.briandunning.com/sample-data/us-500.zip"
download.file(url = url, destfile = basename(url))
adds <- readr::read_csv(basename(url)) %>%
select(address, city,
county, state, zip) %>%
mutate(date = seq.Date(as.Date('2015-01-01'), to = Sys.Date(), length.out = 500)) %>%
mutate(year = lubridate::year(date)) %>%
# to keep it small
sample_n(20)
This works, split addresses by year, apply tidygeocoder
function to return lat/lons, and recombine.
adds_by_year <- adds %>% split(.$year)
geo_list <- lapply(adds_by_year, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
Below does not:
adds <- adds %>%
mutate(yrmn = zoo::as.yearmon(date))
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
Returns this error:
Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
I did some searching and found this, but the proposed solution -- wrapping x in as.data.frame()
, resulted in the same error.
Any insight is appreciated. I've looked into using purrr
but I'm not sure I grok completely.
Here is the full backtrace, which I'm not familiar enough with to parse completely:
Backtrace:
█
1. ├─base::lapply(...)
2. │ └─global::FUN(X[[i]], ...)
3. │ └─tidygeocoder::geocode(...)
4. │ ├─base::do.call(geo, geo_args)
5. │ └─(function (address = NULL, street = NULL, city = NULL, county = NULL, ...
6. │ ├─base::do.call(geo_cascade, all_args[!names(all_args) %in% c("method")])
7. │ └─(function (..., cascade_order = c("census", "osm")) ...
8. │ ├─base::`[<-`(...)
9. │ └─tibble:::`[<-.tbl_df`(...)
10. │ └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
11. │ └─tibble:::tbl_subassign_row(x, i, value, value_arg)
12. │ ├─base::withCallingHandlers(...)
13. │ └─vctrs::`vec_slice<-`(`*tmp*`, i, value = value[[j]])
14. │ └─(function () ...
15. │ └─vctrs:::vec_cast.logical.double(...)
16. │ └─vctrs::maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg)
17. │ ├─base::withRestarts(...)
18. │ │ └─base:::withOneRestart(expr, restarts[[1L]])
19. │ │ └─base:::doWithOneRestart(return(expr), restart)
20. │ └─vctrs:::stop_lossy_cast(...)
21. │ └─vctrs:::stop_vctrs(...)
22. │ └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
23. │ └─rlang:::signal_abort(cnd)
24. │ └─base::signalCondition(cnd)
25. └─(function (cnd) ...