2

I have a similar question. I am trying to fetch coordinates (latitude and longitude) for an address from US Census geocoder link. I have followed the approach mentioned here; however, I am not getting the required result. Let me put down the steps that I have followed during 3 attempts:

Attempt #1 (using RCurl):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
td.html <- getForm(url_geo,
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current",
.opts = curlOptions(ssl.verifypeer = FALSE))

When I see the output of td.html, it is same as what you get when you do "View Page Source" of above webpage. Actually, td.html should instead contain the details of resulting page that appear after submitting form in above webpage.

Attempt #2 (Using httr):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
fd1 <- list(
submit = "Find",
street  = "3211 Providence Dr",
city = "Anchorage",
state   = "AK",
zip = "99508",
benchmark = "Public_AR_Current"
)
resp1<-GET(url_geo, body=fd1, encode="form")
content(resp1)

The content of resp1 is very different from what one would expect.

Attempt #3 (Using rvest):

url_geo <- "http://geocoding.geo.census.gov/geocoder/locations/address?form"
s <- html_session(url_geo)
f0 <- html_form(s)

Here, I get an error:

Error: Current page doesn't appear to be html.

Please help me understand what I am doing wrong. If you need any clarification from me, please let me know.

Community
  • 1
  • 1
skumar
  • 353
  • 2
  • 4
  • 12

2 Answers2

3

The Census site is being nice enough to send you back JSON (that was unexpected and a nice bonus from doing this call):

library(httr)
library(jsonlite)

URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

res <- GET(URL,
           query=list(street="3211 Providence Dr",
                      city="Anchorage",
                      state="AK",
                      zip="99508",
                      benchmark=4))

dat <- fromJSON(content(res, as="text"))

str(dat$result$addressMatches)
## 'data.frame': 1 obs. of  4 variables:
##  $ matchedAddress   : chr "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
##  $ coordinates      :'data.frame':  1 obs. of  2 variables:
##   ..$ x: num -150
##   ..$ y: num 61.2
##  $ tigerLine        :'data.frame':  1 obs. of  2 variables:
##   ..$ tigerLineId: chr "638504877"
##   ..$ side       : chr "L"
##  $ addressComponents:'data.frame':  1 obs. of  12 variables:
##   ..$ fromAddress    : chr "3001"
##   ..$ toAddress      : chr "3399"
##   ..$ preQualifier   : chr ""
##   ..$ preDirection   : chr ""
##   ..$ preType        : chr ""
##   ..$ streetName     : chr "PROVIDENCE"
##   ..$ suffixType     : chr "DR"
##   ..$ suffixDirection: chr ""
##   ..$ suffixQualifier: chr ""
##   ..$ city           : chr "ANCHORAGE"
##   ..$ state          : chr "AK"
##   ..$ zip            : chr "99508"

You can use the flatten parameter to fromJSON to deal with those data frames within a data frame horrible data structure:

dat <- fromJSON(content(res, as="text"), flatten=TRUE)
dplyr::glimpse(dat$result$addressMatches)

## Observations: 1
## Variables: 17
## $ matchedAddress                    (chr) "3211 PROVIDENCE DR, ANCHORAGE, AK, 99508"
## $ coordinates.x                     (dbl) -149.8188
## $ coordinates.y                     (dbl) 61.18985
## $ tigerLine.tigerLineId             (chr) "638504877"
## $ tigerLine.side                    (chr) "L"
## $ addressComponents.fromAddress     (chr) "3001"
## $ addressComponents.toAddress       (chr) "3399"
## $ addressComponents.preQualifier    (chr) ""
## $ addressComponents.preDirection    (chr) ""
## $ addressComponents.preType         (chr) ""
## $ addressComponents.streetName      (chr) "PROVIDENCE"
## $ addressComponents.suffixType      (chr) "DR"
## $ addressComponents.suffixDirection (chr) ""
## $ addressComponents.suffixQualifier (chr) ""
## $ addressComponents.city            (chr) "ANCHORAGE"
## $ addressComponents.state           (chr) "AK"
## $ addressComponents.zip             (chr) "99508"

This wraps it into a function for easier calling:

#' Geocode address using the Census API
#'
#' @param steet Street
#' @param city City
#' @param state State
#' @param zip Zip code
#' @param benchmark "\code{current}" for this most current information,
#'        "\code{2014}" for data from the 2014 U.S. ACS survey,
#'        "\code{2010}" for data from the 2010 U.S. Census. This defaults
#'        to "\code{current}".
#' @result \code{list} of query params and response values. If successful,
#'         the geocoded values will be in \code{var$result$addressMatches}
census_geocode <- function(street, city, state, zip, benchmark="current") {

  URL <- "http://geocoding.geo.census.gov/geocoder/locations/address"

  bench <- c(`current`=4, `2014`=8, `2010`=9)[benchmark]

  res <- GET(URL,
             query=list(street=street, city=city, state=state,
                        zip=zip, benchmark=bench))

  warn_for_status(res)

  fromJSON(content(res, as="text"), flatten=TRUE)

}

census_geocode("3211 Providence Dr", "Anchorage", "AK", "99508")
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Hi hrbrmstr - Thank you so much for answering in detail. I really appreciate. The one thing that I am still trying to understand is as follow: if you see my post, you would notice that my URL is slightly different from your URL, though both land on to the same web page. The one used by you gives the required result, but the one used by me does not give the required result. Can you please comment here? Wish you Merry Christmas and Happy New Year in ahead. – skumar Dec 27 '15 at 16:34
  • Um, mine works? :-) I used it in the example as it was the one I saw in Dev Tools & Burp Suite when I was trying to ensure I had all the parameters needed. Not sure what else to say. – hrbrmstr Dec 27 '15 at 16:52
  • :-). Thank you again. Is there a way to identify which form you want to fill using httr, the way we have in rvest (see my post and the link that I followed from stackoverflow)? I feel if you have multiple forms in a webpage, then this information may be useful. If you have some knowledge on this front, please throw some light. Thanks. – skumar Dec 27 '15 at 17:28
0

Build your URL and submit the resulting URL directly, bypassing any form! For instance, with the parameters you selected, you obtain the following URL:

urlgeo<-"http://geocoding.geo.census.gov/geocoder/locations/address?street=3211+Providence+Dr&city=Anchorage&state=AK&zip=99508&benchmark=4"

Then, you can simply retrieve the content through getURL:

getURL(urlgeo)

will have all the needed info. To build the URL, just paste its arguments, replacing any blank space with a +.

nicola
  • 24,005
  • 3
  • 35
  • 56
  • Hi nicola - Thank you so much for answering my query. This is really helpful. Wish you merry Christmas and happy New Year 2016. – skumar Dec 27 '15 at 17:21
  • 1
    Actually, I didn't see your comment while posting as the page didn't update. – akrun Dec 29 '15 at 16:12