2

I am trying to connect google's geocode api and github api to parse user's location and create a list out of it.

The array (list) I want to create is like this:

location, lat, lon, count
San Francisco, x, y, 4
Mumbai, x1, y1, 5

Where location, lat and lon is parsed from Google geocode, count is the occurrence of that location. Eevery time a new location is added: if it exists in the list the count is incremented otherwise it is appended to the array(list) with location, lat, lon and the count should be 1.

Another example:

location, lat, lon, count
Miami x2, y2, 1 #first occurrence
San Francisco, x, y, 4 #occurred 4 times already
Mumbai, x1, y1, 5 #occurred 5 times already
Cairo, x3, y3, 1 #first occurrence

I can already get the user's location from github and can get the geocoded data from google. I just need to create this array in python which I'm struggling with.

Can anyone help me? thanks.

chatu
  • 305
  • 5
  • 13
  • I'd suggest using a dictionary (`dict`) instead. – Amber Apr 23 '13 at 14:42
  • If you want a list for printing with the csv module, check [this](http://stackoverflow.com/a/8685873/264775) answer for a way to do that with a dict. – thegrinner Apr 23 '13 at 14:50
  • 1
    Is lat/long directly correlated to the location, e.g., will all San Francisco locations have the same lat/long? If not, you're going to be requiring additional structures to keep that data intact, as well. – hexparrot Apr 23 '13 at 15:02

5 Answers5

4

With collections.Counter, you could do :

from collections import Counter

# initial values
c=Counter({("Mumbai", 1, 2):5, ("San Francisco", 3,4): 4})

#adding entries
c.update([('Mumbai', 1, 2)])
print c  # Counter({('Mumbai', 1, 2): 6, ('San Francisco', 3, 4): 4})

c.update([('Mumbai', 1, 2), ("San Diego", 5,6)])
print c  #Counter({('Mumbai', 1, 2): 7, ('San Francisco', 3, 4): 4, ('San Diego', 5, 6): 1})
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
2

This would be better stored as a dictionary, indexed by city name. You could store it as two dictionaries, one dictionary of tuples for latitude/longitude (since lat/long never changes):

lat_long_dict = {}
lat_long_dict["San Francisco"] = (x, y)
lat_long_dict["Mumbai"] = (x1, y1)

And a collections.defaultdict for the count, so that it always starts at 0:

import collections
city_counts = collections.defaultdict(int)

city_counts["San Francisco"] += 1
city_counts["Mumbai"] += 1
city_counts["San Francisco"] += 1
# city counts would be
# defaultdict(<type 'int'>, {'San Francisco': 2, 'Mumbai': 1})
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • And how would I add lat and lon to this dict? – chatu Apr 23 '13 at 14:56
  • Perhaps I'm doing something wrong. my output is test: {u'San Francisco, CA, USA': '-122.4194155, 37.7749295'} - defaultdict(, {u'San Francisco, CA, USA': 1}); #count should be 4. – chatu Apr 23 '13 at 15:04
  • 1
    It is better to include lat and long in the key (to differentiate Paris, France and Paris, Texas...), so one should rather use a tuple (city, lat, long) as key – Thierry Lathuille Apr 23 '13 at 15:06
  • A collections.Counter would be better than a defaultdict, it's specifically designed for... counting! :-) Also, what @ThierryLathuille said. – Endophage Apr 23 '13 at 15:06
  • @ThierryLathuille I like what you are saying but I just don't know how to do it in python. Could you direct me to a url or update your answer? Thanks. – chatu Apr 23 '13 at 15:10
  • @DavidRobinson I ended up using Thierry Lathuille's suggestion but thank you. – chatu Apr 23 '13 at 15:24
1

Python has a pre-baked class specifically for counting occurences of things: its called collections.Counter. If you can generate an iterator that gives successive tuples (city, lat, lon) from your input data (perhaps with a generator expression), simply passing that into Counter will directly give you what you're looking for. eg,

>>> locations = [('Miami', 1, 1), ('San Francisco', 2, 2), ('Mumbai', 3, 3), ('Miami', 1, 1), ('Miami', 1, 1)]
>>> Counter(locations)
Counter({('Miami', 1, 1): 3, ('San Francisco', 2, 2): 1, ('Mumbai', 3, 3): 1})

If you need to be able to add more locations as the program runs instead of batching them, put the relevant tuples into that Counter's update method.

lvc
  • 34,233
  • 10
  • 73
  • 98
1

This is sort of an amalgamation of all the other recommended ideas:

from collections import defaultdict

inputdata = [('Miami', 'x2', 'y2'),
             ('San Francisco', 'x', 'y'),
             ('San Francisco', 'x4', 'y4'),
             ('Mumbai', 'x1', 'y1'),
             ('Cairo', 'x3', 'y3')]

counts, coords = defaultdict(int), defaultdict(list)

for location, lat, lon in inputdata:
    coords[location].append((lat,lon))
    counts[location] += 1

print counts, coords

This uses defaultdict, which, as you can see allows for an easy way to both:

  1. count the number of occurrences by city
  2. keep lat/lon pairs intact

RETURNS:

defaultdict(<type 'int'>, {'Miami': 1, 'San Francisco': 2, 'Cairo': 1, 'Mumbai': 1}) 
defaultdict(<type 'list'>, {'Miami': [('x2', 'y2')], 'San Francisco': [('x', 'y'), ('x4', 'y4')], 'Cairo': [('x3', 'y3')], 'Mumbai': [('x1', 'y1')]})

This answer makes an (unverified) assumption that the granularity of your lat/lon pairs are unlikely to repeat, but that in fact you're only interested in making counts-by-city.

hexparrot
  • 3,399
  • 1
  • 24
  • 33
0

How about using a python dict? You can read about them here

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Here is a sample implementation:

// Create an empty dictionary.
dat = {}

if dat.has_key(location):
    dat[location] = dat[location] + 1
else:
    dat[location] = 1
Rishabh Sagar
  • 1,744
  • 2
  • 17
  • 27