2

I have a question - what would be the best way to figure out in which timezone particular user is situated based on the location field data? It seems like considerable amount of users have this field populated with some data, the form, however, is far from being normalized.

While I am figuring out ways to normalize users locations and infer timezones, I wonder, if someone did it before and could share some experience, or maybe (ideally) there is some magic webservice which I can ask for timezones by a given location?

So far I am running through fairly simple process - tokenizing the field, sorting, grouping by frequencies and assigning timezones manually based on my best knowledge.

seninp
  • 712
  • 1
  • 6
  • 23
  • 4
    Your question isn't specific to the SO data dump. It's a question about interpreting natural language location strings. Thus I think it fits SO better than meta.SO. – CodesInChaos Oct 20 '12 at 22:17
  • I think you are right, should I ask similar question there, or can I move my question over? I've posted in meta because I was targeting users who have experience with public data dump. – seninp Oct 21 '12 at 07:08
  • 1
    "considerable amount" is debatable [it's 22%](http://data.stackexchange.com/stackoverflow/query/82875) as at the last refresh of data.se. I also seem to remember that from a specific point it's a look up on Yahoo Placefinder API (that caused a few problems) but I can't find the reference. – Ben Oct 21 '12 at 10:26
  • thank you for the hints. I found quite a few good posts at SO. Well, you are right about the number, I wish there was more data. In fact, it's about 20K of different locations among 260K of users (out of ~1.3M) as per August 2012 dump. I use Lucene's tokenizer and [Goggle Geocoding API](https://developers.google.com/maps/documentation/geocoding/). It works well so far. I'll try Yahoo for unknown ones - like "on the Server Farm" ;) – seninp Oct 21 '12 at 12:05

0 Answers0