0

I am trying to retrieve coordinates for around 500k addresses. To do so, I am using the code from this blogpost with a few changes to adapt it to my data. I have divided my dataset in a few batches to make sure I don't hit the daily limit in OpenStreetMap. The first two ran perfectly fine, but now the same code gives me an error for some reason. The code is the following:

geocode <- function(name, address, city){
  
  # NOMINATIM SEARCH API URL
  src_url <- "https://nominatim.openstreetmap.org/search?q="
  
  # CREATE A FULL ADDRESS
  addr <- paste(address, city, sep = "%2C")

  # CREATE A SEARCH URL BASED ON NOMINATIM API TO RETURN GEOJSON
  requests <- paste0(src_url, query, "&format=geojson")
  
  # ITERATE OVER THE URLS AND MAKE REQUEST TO THE SEARCH API
  for (i in 1:length(requests)) {
    
  # QUERY THE API TRANSFORM RESPONSE FROM JSON TO R LIST
    #read_page <- function(i) {
    #tryCatch(
    #{
    response <- read_html(requests[i]) %>%
      html_node("p") %>%
      html_text() %>%
      fromJSON()
    # },
    #      error = function(cond) return(NULL),
    #    finally = print(i)
    # )
    # }
    
    # FROM THE RESPONSE EXTRACT LATITUDE AND LONGITUDE COORDINATES
    lon <- response$features$geometry$coordinates[[1]][1]
    lat <- response$features$geometry$coordinates[[1]][2]
    
    # CREATE A COORDINATES DATAFRAME
    if(i == 1) {
      loc <- tibble(name = name[i], 
                    address = str_replace_all(addr[i], "%2C", ","),
                    latitude = lat, longitude = lon)
    }else{
      df <- tibble(name = name[i], 
                   address = str_replace_all(addr[i], "%2C", ","),
                   latitude = lat, longitude = lon)
      loc <- bind_rows(loc, df)
    }
  }
  return(loc)
}

# READ THE DATA
data <- read_csv("projects_addresses_no_coordinates_third_batch.csv")

# REMOVE SPACE FROM COLUMNS
colnames(data) <- str_replace_all(colnames(data)," ", "_")

# EXTRACT THE ADDRESS
address <- data$address

# CLEAN SPECIAL CASES (e.g. 1 N MAY BLDG)
query <- str_replace_all(string = address, 
                         pattern = "BLDG", 
                         replacement = " ")

# CLEAN SPECIAL CASES (e.g. 3333-3339 N CLARK)
query <- stri_replace(str = query, 
                      replacement = " ", 
                      regex = "(-[0-9]+\\s)")

# REPLACE SPACES (\\s) OR COMMAS (,) WITH PLUS SIGN (+)
query <- str_replace_all(string = query, 
                         pattern = "\\s|,", 
                         replacement = "+")

# THE ADDRESS VARIABLE SHOULD NOW BE READY FOR THE API
# USE THE GEOCODE FUNCTION TO FIND THE COORDINATES

df <- geocode(name = data$identifier,
              address = query,
              city = data$comune)

When I run the geocode command at the end I get the following error:

Error in parse_con(txt, bigint_as_char) : 
lexical error: invalid char in json text.
<!DOCTYPE html> <html lang="en"
(right here) ------^

Any idea of how to solve this? I tried on different computers because I thought I hit the query threshold limit (see this post), but it doesn't seem to be the problem.

0 Answers0