0

I have two CSV files containing coordinate information for points. One file has a list of eleven unique IDs (let's call it File 1), and each ID corresponds to a lat/lon coordinate. The other file (let's call it File 2) has around 300 lat/lon points, and each point matches one of those eleven unique IDs. What I want to do is check to see if a point in File 2 has the same ID has a point in File 1, and if so, calculate the distance between those two points.

I have tried to do so using the following code:

for index, row in res_df:
    for windex, wrow in will_df: 
        if res_df['Residence_ID'] == will_df['Residence_ID']:
            print(res_df.distance(will_df))

However, when I try this, I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-73-b4268cef3e71> in <module>()
----> 1 for index, row in res_df:
      2   for windex, wrow in will_df:
      3     if res_df['Residence_ID'] == will_df['Residence_ID']:
      4       print(res_df.distance(will_df))
      5 

ValueError: too many values to unpack (expected 2)

I also tried using iterrows at one point, but that did not fix my problem.

Additionally, I wanted to try to count the number of matching records, and I ran into a problem here too:

counter = 0
for id1 in res_df['Residence ID']:
  for id2 in will_df['Residence_ID']:
    if id1 == id2:
      print("match")
      counter += 1
print(counter)

When I run the code above, my counter returns a value of 52; however, this doesn't make sense, because all of my 300 records in File 2 match with some record in File 1. So, I think I am missing some fundamental logic here.

EDIT:

I also just tried:

for index, row in res_df.items():
  for windex, wrow in will_df.items(): 
    if res_df['Residence_ID'] == will_df['Residence_ID']:
      print(res_df.distance(will_df))

and the error message is:

ValueError                                Traceback (most recent call last)
<ipython-input-80-65e9f78ad2a5> in <module>()
      1 for index, row in res_df.items():
      2   for windex, wrow in will_df.items():
----> 3     if res_df['Residence_ID'] == will_df['Residence_ID']:
      4       print(res_df.distance(will_df))

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __nonzero__(self)
   1553             "The truth value of a {0} is ambiguous. "
   1554             "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555                 self.__class__.__name__
   1556             )
   1557         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Not sure what this means.

Brenda Thompson
  • 327
  • 2
  • 9
  • I asked a very similar question [here](https://stackoverflow.com/questions/59489368/pandas-return-separate-dataframe-values-based-on-function). I then took the lat/longs with the combined data set and used a distance function from the `geopy` library. Maybe this puts you on the right directions – MattR Jan 03 '20 at 20:22
  • for counting use this: len(will_df[will_df['Residence_ID'].isin(pd.unique(res_df['Residence ID'])]) It will give u the count of records in will_df that have same Residence_ID in res_df. For your question, I would suggest you to break it into two steps first step find the common records using the above mentioned way and in next step find the distance between them. – No_body Jan 03 '20 at 20:37
  • @No_body I'm getting a syntax error with the code you wrote so not sure what the correct method to use is – Brenda Thompson Jan 03 '20 at 21:33
  • @No_body also the issue with your code is that it only returns unique values -- I don't need unique values, I need an output of matched pairs. – Brenda Thompson Jan 03 '20 at 21:34
  • @MattR so the thing is, your question/problem excludes the very part of my project that is raising a problem, which is that I need to match pairs of values based on a certain criteria (ie same ID names), which you don't do in your problem. – Brenda Thompson Jan 03 '20 at 21:36
  • Start with a pandas join something like (untested) `res_df.merge(will_df, how="inner", on="Residence_ID").` If you have mismatches you may need to use one of the outer joins. That will yield a new dataframe matched up, that you can then filter based on a distance calculation – Andrew Lavers Jan 03 '20 at 22:47

1 Answers1

0

For the first part of your question,

for index, row in res_df:

will return an error because for index, row expects two items. You give it only one, res_df.

You can loop over it (although you shouldn't) with

for index, row in (df.iterrows()):
     print(index, "\n", row, "\n\n")

For your second attempt,

      1 for index, row in res_df.items():
      2   for windex, wrow in will_df.items():
----> 3     if res_df['Residence_ID'] == will_df['Residence_ID']:

This causes an error because you are comparing the entire series of res_df["Residence_ID"] with will_df['Residence_ID']. If you're dedicated to looping (which, as I'll return to later, you should not be), you want to use row["Residence_ID"] == wrow["Residence_ID"].

Now, you appear to be trying to calculate a distance (not sure if you have a custom function for that or if it's part of GeoPandas), but what you probably want to do is join these two dataframes together, then calculate a new column distance_between (or something) based on the joined dataframe.

If you can post some sample data, we can help you with that, or you can search the docs/SO for it.

More on joins here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

Evan
  • 2,121
  • 14
  • 27