0

I'm trying to clean up user inputted addresses, so I thought using GGMAP to extract the Longitude/Latitude and Address used would be a way to clean everything up. However, the Address it spits out sometimes has colloquial names in the address and it makes it hard to parse out the individual location aspects.

Here's the code I'm using

for(i in 1:nrow(Raw_Address))
   {
     result <- try(geocode(Raw_Address$Address_Total[i], output = "more", source = "google"))
     Raw_Address$lon[i] <- as.numeric(result[1])
     Raw_Address$lat[i] <- as.numeric(result[2])
     Raw_Address$geoAddress[i] <- as.character(result[3])

   }

I tried changing the "latlona" to "more" and going through the result numbers, but only got back different longitude/latitudes. I didn't see anywhere in the documentation that shows the results vectors.

Basically, I want Street Name, City, State, Zip, Longitude, and Latitude.

Edit: Here's an example of the data

User Input: 1651 SE TIFFANY AVE. PORT ST. LUCIE FL

GGMAP Output: martin health systems - tiffany ave., 1651 se tiffany ave, port st. lucie, fl 34952, usa

This is hard to parse because of the colloquial name. I could use the stringr package to try and parse, but it probably wouldn't be all inclusive. But it returns a distinct address while some users spell "Tiffany" wrong or spell out "Saint" instead of "St."

user3304359
  • 335
  • 1
  • 9
  • 1
    Please provide some [example data](https://stackoverflow.com/a/5963610/2359523) for others to help. If you could also provide what you are seeing, and what your desired outcome is. – Anonymous coward Aug 31 '18 at 15:24
  • Thanks! I added an example – user3304359 Aug 31 '18 at 15:40
  • I may not be clear. When I geocode that address it returns a dataframe with fields split, like `street_number`, etc., in addition to that `address` field. You could just select those fields and assemble later as needed. – Anonymous coward Aug 31 '18 at 16:02

1 Answers1

0

Rather than using a for loop, purrr::map_dfr will iterate over a vector and rbind the resulting data frames into a single one, which is handy here. For example,

library(tidyverse)

libraries <- tribble(
    ~library,                      ~address,
    "Library of Congress",         "101 Independence Ave SE, Washington, DC 20540",
    "British Library",             "96 Euston Rd, London NW1 2DB, UK",
    "New York Public Library",     "476 5th Ave, New York, NY 10018", 
    "Library and Archives Canada", "395 Wellington St, Ottawa, ON K1A 0N4, Canada"
)

library_locations <- map_dfr(libraries$address, ggmap::geocode, 
                             output = "more", source = "dsk")

This will output a lot of messages, some telling you what geocode is calling, e.g.

#> Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=101%20Independence%20Ave%20SE,%20Washington,%20DC%2020540&sensor=false

and some warning that factors are being coerced to character:

#> Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector

which they should be, so you can ignore them all. (If you really want you can write more code to make them go away, but you'll end up with the same thing.)

Combine the resulting data frames, and you get all the location data linked to your original dataset:

full_join(libraries, library_locations)
#> Joining, by = "address"
#> # A tibble: 4 x 15
#>   library address      lon   lat type  loctype north south    east     west
#>   <chr>   <chr>      <dbl> <dbl> <chr> <chr>   <dbl> <dbl>   <dbl>    <dbl>
#> 1 Librar… 101 In…  -77.0    38.9 stre… rooftop  38.9  38.9 -77.0    -77.0  
#> 2 Britis… 96 Eus…   -0.125  51.5 stre… rooftop  51.5  51.5  -0.124   -0.126
#> 3 New Yo… 476 5t…  -74.0    40.8 stre… rooftop  40.8  40.8 -74.0    -74.0  
#> 4 Librar… 395 We… -114.     60.1 coun… approx…  83.1  41.7 -52.3   -141.   
#> # … with 5 more variables: street_number <chr>, route <chr>,
#> #   locality <chr>, administrative_area_level_1 <chr>, country <chr>

You may notice that Data Science Toolkit has utterly failed to geocode Libraries and Archives Canada, for whatever reason—it's marked as a country instead of an address. Geocoders are faulty sometimes. From here, subset out whatever you don't need.

If you want even more information, you can use geocode's output = "all" method, but that returns a list you'll need to parse, which takes more work.

alistaire
  • 42,459
  • 4
  • 77
  • 117