I am trying to retrieve coordinates for around 500k addresses. To do so, I am using the code from this blogpost with a few changes to adapt it to my data. I have divided my dataset in a few batches to make sure I don't hit the daily limit in OpenStreetMap. The first two ran perfectly fine, but now the same code gives me an error for some reason. The code is the following:
geocode <- function(name, address, city){
# NOMINATIM SEARCH API URL
src_url <- "https://nominatim.openstreetmap.org/search?q="
# CREATE A FULL ADDRESS
addr <- paste(address, city, sep = "%2C")
# CREATE A SEARCH URL BASED ON NOMINATIM API TO RETURN GEOJSON
requests <- paste0(src_url, query, "&format=geojson")
# ITERATE OVER THE URLS AND MAKE REQUEST TO THE SEARCH API
for (i in 1:length(requests)) {
# QUERY THE API TRANSFORM RESPONSE FROM JSON TO R LIST
#read_page <- function(i) {
#tryCatch(
#{
response <- read_html(requests[i]) %>%
html_node("p") %>%
html_text() %>%
fromJSON()
# },
# error = function(cond) return(NULL),
# finally = print(i)
# )
# }
# FROM THE RESPONSE EXTRACT LATITUDE AND LONGITUDE COORDINATES
lon <- response$features$geometry$coordinates[[1]][1]
lat <- response$features$geometry$coordinates[[1]][2]
# CREATE A COORDINATES DATAFRAME
if(i == 1) {
loc <- tibble(name = name[i],
address = str_replace_all(addr[i], "%2C", ","),
latitude = lat, longitude = lon)
}else{
df <- tibble(name = name[i],
address = str_replace_all(addr[i], "%2C", ","),
latitude = lat, longitude = lon)
loc <- bind_rows(loc, df)
}
}
return(loc)
}
# READ THE DATA
data <- read_csv("projects_addresses_no_coordinates_third_batch.csv")
# REMOVE SPACE FROM COLUMNS
colnames(data) <- str_replace_all(colnames(data)," ", "_")
# EXTRACT THE ADDRESS
address <- data$address
# CLEAN SPECIAL CASES (e.g. 1 N MAY BLDG)
query <- str_replace_all(string = address,
pattern = "BLDG",
replacement = " ")
# CLEAN SPECIAL CASES (e.g. 3333-3339 N CLARK)
query <- stri_replace(str = query,
replacement = " ",
regex = "(-[0-9]+\\s)")
# REPLACE SPACES (\\s) OR COMMAS (,) WITH PLUS SIGN (+)
query <- str_replace_all(string = query,
pattern = "\\s|,",
replacement = "+")
# THE ADDRESS VARIABLE SHOULD NOW BE READY FOR THE API
# USE THE GEOCODE FUNCTION TO FIND THE COORDINATES
df <- geocode(name = data$identifier,
address = query,
city = data$comune)
When I run the geocode command at the end I get the following error:
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
<!DOCTYPE html> <html lang="en"
(right here) ------^
Any idea of how to solve this? I tried on different computers because I thought I hit the query threshold limit (see this post), but it doesn't seem to be the problem.