0

I have about 9 million records with latitude and longitude, in addition to the timestamp in EST. I want to use the latitude and longitude to generate the appropriate regional timezone, from which I then want to adjust all these times for the relevant timezone.

I have tried using geonames

data_raw$tz <- mapply(GNtimezone, data$lat,data$lon)

However, this returns the following error:

Error in getJson("timezoneJSON", list(lat = lat, lng = lng, radius = 0)) : 
error code 13 from server: ERROR: canceling statement due to statement timeout

I have tried to use a method described in this post.

data$tz_url <- sprintf("https://maps.googleapis.com/maps/api/timezone/%s?location=%s,%s&timestamp=%d&sensor=%s", 
              "xml", 
              data$lat, 
              data$lon, 
              as.numeric(data$time), 
              "false")

for(i in 1:100){
  data$tz[i] <- xmlParse(readLines(data$tz_url[i]), isURL=TRUE)[["string(//time_zone_name)"]]
}  

With this method, I am able to get the urls for the XML data. But when I try to pull the XML data in a for loop, and append the timezone to the dataframe, it doesn't do it for all the records... (in fact, only 10 records at a time intermittently).

Does anyone know of any alternate methods or packages to get the three character timezone (i.e. EST) for about 9 million records relatively quickly? Your help is much appreciated. Or better yet, if you have ideas on why the code I used above isn't working, I'd appreciate that too.

Community
  • 1
  • 1
A Trask
  • 13
  • 2
  • 7
  • If you are looking for a generlized time zone, are you considered calculating it from longitude? 360 degrees by 24 hours. It could be close enough depending on your needs. – Dave2e May 26 '16 at 02:09

2 Answers2

2

For a list of methods of converting latitude and longitude to time zone, see this post. These mechanisms will return the IANA/Olson time zone identifier, such as America/Los_Angeles.

However, you certainly don't want to make 9 million individual HTTP calls. You should attempt to group the records to distinct locations to minimize the number of lookups. If they are truly random, then you will still have a large number of locations, so you should consider the offline mechanisms described in the previous post (i.e. using the tz_world shapefile with some sort of geospatial lookup mechanism).

Once you have the IANA/Olson time zone identifier for the location, you can then use R's time zone functionality (as.POSIXct, format, etc.) with each of corresponding timestamp to obtain the abbreviation.

However, do recognize that time zone abbreviations themselves can be somewhat ambiguous. They are useful for human readability, but not much else.

Community
  • 1
  • 1
Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • So I guess there aren't any ideas for tz_world in R? I saw your other post, but there's nothing there regarding using the shapefile in R. – A Trask May 27 '16 at 10:36
  • I haven't seen one. Doesn't mean it isn't out there somewhere. You can still use that technique, you'll just have to figure out how to use one of the R geospatial libraries, such as rgdal. – Matt Johnson-Pint May 27 '16 at 14:13
1

I've written the package googleway to access google maps API. You'll need a valid API key (and, for Google to handle 9 million calls you'll have to pay for it as their free one only covers 2500)

library(googleway)

key <- "your_api_key"

google_timezone(location = c(-37, 144),
                key = key)


$dstOffset
[1] 0

$rawOffset
[1] 36000

$status
[1] "OK"

$timeZoneId
[1] "Australia/Hobart"

$timeZoneName
[1] "Australian Eastern Standard Time"
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139