0

Objective: I am trying to compute distance between all zip codes, but only include those that are 25 miles or less apart.

Problem: The way I think the code should work is pull a zip code using a loop, and then with another loop pull another zip code. The problem is, if zip1 = XXX, it only grabs pairs that start with XXX and fails to then move to YYY for zip1 once it has iterated over all the combinations.

for line in f_in:
zip1 = line["ZIP"]
lat = line["LAT"]
lon = line["LNG"]
loc = (float(lat),float(lon))
    for entry in b_in:
        zip2 = entry["ZIP"]
        lat2 = entry["LAT"]
        lon2 = entry["LNG"]
        loc2 = (float(lat2),float(lon2))
        combzip = str(zip1)+str(zip2)
        print combzip
        if not ziplist.has_key(combzip):
            dist = haversine.haversine(loc,loc2,miles=True)
            if dist > 20:
                continue
            ziplist[combzip] = [zip1,zip2,dist]
leeum
  • 264
  • 1
  • 13
  • If it's supposed to be a nested loop, you are missing the indent for the second "for" section. – offeltoffel Sep 19 '17 at 14:21
  • @offeltoffel thanks that was a copy and pasting error, the code is indented already and the error is still there – leeum Sep 19 '17 at 14:23
  • 1
    What is the type of the `b_in` object? Some objects can only be iterated over once. Is `b_in` a file? You can't iterate over a file multiple times unless you manually `seek` back to the beginning. – Kevin Sep 19 '17 at 14:24
  • @Kevin interesting, b_in is a duplicate of the f_in file, both are csv's (note: they are two separate files) – leeum Sep 19 '17 at 14:25
  • 2
    @leeum: There is no need to duplicate a file. Read in the content of your file and only access your dataframe (, list, dictionary, numpy-array, ...). It's not only more effective, but will prevent errors like yours (Kevin is right, you'd have to rewind your file to read it several times) – offeltoffel Sep 19 '17 at 14:28
  • @offeltoffel ok so I read in the f_in, and then what? Sorry my python knowledge is very very minimal. I am familiar with R if that would help to explain it. – leeum Sep 19 '17 at 14:33
  • If you know R, you could first learn how to use the `Pandas` module which allows some commands and concepts that are very similar to R (e.g. dataframes). If you edit your question to explain the structure of your `f_in`, we might be able to help you with reading its content. Basically you create lists or dictionaries that contain the content of your file and you iterate over all items in separate loops. – offeltoffel Sep 19 '17 at 14:36
  • @offeltoffel ok I think a dictionary makes sense here since I have unique keys (zip codes) that I can pair with a list of their respective lat and lon. Will report back shortly, but intuitively this seems to make more sense. – leeum Sep 19 '17 at 14:38
  • 1
    This won't effect the loop, but it would be better to structure your last "if statement" like this: `if dist < 20: ziplist[combzip] = [zip1,zip2,dist]`. Also, as offeltoffel said, it would be beneficial for us to see the structure of f_in and b_in. – Evan Nowak Sep 19 '17 at 14:42
  • @offeltoffel so your solution worked, my question is that the dictionary has 30,000 keys, so when I go to pair them up to compute the distance, it's over 1B pairs. Is there a way to make this faster? Or is it too big for python to handle? I was able to find a database where someone already did this, so I am really just asking in order to learn. – leeum Sep 19 '17 at 19:15

1 Answers1

0

Don't know if you noticed, but be carefull with indentation

for line in f_in:
    zip1 = line["ZIP"]
    lat = line["LAT"]
    lon = line["LNG"]
    loc = (float(lat),float(lon))
    for entry in b_in:
        zip2 = entry["ZIP"]
        lat2 = entry["LAT"]
        lon2 = entry["LNG"]
        loc2 = (float(lat2),float(lon2))
        combzip = str(zip1)+str(zip2)
        print combzip
        if not ziplist.has_key(combzip):
            dist = haversine.haversine(loc,loc2,miles=True)
            if dist > 20:
                continue
            ziplist[combzip] = [zip1,zip2,dist]
Floaterz
  • 128
  • 9
  • thanks but that was actually a copy and pasting error on my part. The code I have reads the same as yours. Unsure why it fails to work. – leeum Sep 19 '17 at 14:24