1

I want to make a new column of type boolean based on the value of 4 other columns. I have a function is_proximal that takes two 2-tuples (the 4 values) and returns a boolean.

I am passing columns from a pandas DataFrame to this function. The is_proximal function in turn calls geopy.distance.distance with the arguments.

def is_proximal(p1, p2, exact=True):
    dist = distance(p1, p2)

    if exact:
        return dist.miles < 0.75  # mile threshold

    return dist.m < 100  # meter threshold



airbnb_coords = (df.loc[:, "lat_airbnb"], df.loc[:, "long_airbnb"])
homeaway_coords = (df.loc[:, "lat_homeaway"], df.loc[:, "long_homeaway"])
exacts.loc[:, "proximal"] = is_proximal(airbnb_coords, homeaway_coords)

This results in the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I'm having trouble understanding why this error is occurring. What changes would I need to make to accomplish what I'm trying to do?

The expected output is an additional column of type boolean. The input dataframe df contains integer values in all columns.

The full traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-663-435de26b3cfa> in <module>
----> 1 m = filter_geographic_proximity(beds)

~/src/exemption_project/src/match.py in filter_geographic_proximity(df)
     53     airbnb_coords = (exacts.loc[:, "lat_airbnb"], exacts.loc[:, "long_airbnb"])
     54     homeaway_coords = (exacts.loc[:, "lat_homeaway"], exacts.loc[:, "long_homeaway"])
---> 55     exacts.loc[:, "proximal"] = is_proximal(airbnb_coords, homeaway_coords)
     56 
     57     airbnb_coords = (inexacts.loc[:, "lat_airbnb"], inexacts.loc[:, "long_airbnb"])

~/src/exemption_project/src/match.py in is_proximal(p1, p2, exact)
     29 def filter_geographic_proximity(df):
     30     def is_proximal(p1, p2, exact=True):
---> 31         dist = distance(p1, p2)
     32 
     33         if exact:

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/distance.py in __init__(self, *args, **kwargs)
    387         kwargs.pop('iterations', 0)
    388         major, minor, f = self.ELLIPSOID
--> 389         super(geodesic, self).__init__(*args, **kwargs)
    390 
    391     def set_ellipsoid(self, ellipsoid):

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/distance.py in __init__(self, *args, **kwargs)
    162         elif len(args) > 1:
    163             for a, b in util.pairwise(args):
--> 164                 kilometers += self.measure(a, b)
    165 
    166         kilometers += units.kilometers(**kwargs)

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/distance.py in measure(self, a, b)
    408     # Call geographiclib routines for measure and destination
    409     def measure(self, a, b):
--> 410         a, b = Point(a), Point(b)
    411         lat1, lon1 = a.latitude, a.longitude
    412         lat2, lon2 = b.latitude, b.longitude

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/point.py in __new__(cls, latitude, longitude, altitude)
    163                     )
    164                 else:
--> 165                     return cls.from_sequence(seq)
    166 
    167         if single_arg:

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/point.py in from_sequence(cls, seq)
    403             raise ValueError('When creating a Point from sequence, it '
    404                              'must not have more than 3 items.')
--> 405         return cls(*args)
    406 
    407     @classmethod

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/point.py in __new__(cls, latitude, longitude, altitude)
    176 
    177         latitude, longitude, altitude = \
--> 178             _normalize_coordinates(latitude, longitude, altitude)
    179 
    180         self = super(Point, cls).__new__(cls)

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/geopy/point.py in _normalize_coordinates(latitude, longitude, altitude)
     57 
     58 def _normalize_coordinates(latitude, longitude, altitude):
---> 59     latitude = float(latitude or 0.0)
     60     longitude = float(longitude or 0.0)
     61     altitude = float(altitude or 0.0)

~/.local/share/virtualenvs/exemption_project-xI6bzvA1/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
cs95
  • 379,657
  • 97
  • 704
  • 746
vaer-k
  • 10,923
  • 11
  • 42
  • 59
  • This is not a runnable code example, as such it isn't clear what the expected output is. – cs95 Jun 15 '19 at 02:16
  • @cs95 Can you please point me to where SO asks exclusively for runnable code? Should I provide you with my data sources as well? – vaer-k Jun 15 '19 at 02:17
  • Here is the link: [mcve]. Also see [ask]. You've probably taken the [tour] as well, but adding it anyway for good measure. – cs95 Jun 15 '19 at 02:18
  • What more could I provide you other than the actual data and the `pd.read_csv` line. Your request is overreaching. This is minimum reproducibility. – vaer-k Jun 15 '19 at 02:19
  • make dummy variables that mimic your data on a smaller scale – Ben Jones Jun 15 '19 at 02:19
  • 5-10 rows of your data and expected output, like I mentioned . . . – cs95 Jun 15 '19 at 02:19
  • No, this is not what is required by SO standards. – vaer-k Jun 15 '19 at 02:20
  • You haven't even provided the complete traceback, so it isn't clear what line of code throws the error. I'm going to go out on a limb and say the problem is caused inside the distance function (**code you haven't even included here**) because `The truth value of a Series is ambiguous` is usually symptomatic of code being fed vectors when it is designed to work with scalars. If you really want help, please stop being stubborn and follow our advice. Thanks :-) – cs95 Jun 15 '19 at 02:22
  • The distance function is supplied by a third-party library called `geopy`. The specific function in question is called `geopy.distance.distance`. You're welcome to view the documentation, but I won't start providing the source code of 3rd party libraries in SO questions. – vaer-k Jun 15 '19 at 02:27
  • Right, so more context that was not provided in the OP. So you can solve this using a list comprehension. – cs95 Jun 15 '19 at 02:31
  • `df['proximal'] = [is_proximal((a, b), (c, d)) for a, b, c, d in df[['lat_x', 'long_x', 'lat_y', 'long_y']].values]` where `*_x` and `*_y` are your columns. – cs95 Jun 15 '19 at 02:33
  • @cs95 why did you mark this question as a duplicate when neither the question nor any of the provided solutions in the linked question are the same or even similar? – vaer-k Jun 15 '19 at 02:41
  • I would've closed is as off-topic lacking reproducibility but this works too. The linked duplicate does answer your question in the sense that it explains you cannot pass vectors to code that is meant to work with scalars. With that in mind, the next port of call would've been to restructure your code to loop through your data instead of shoving all your data down the throat of your function. I've answered your question though, did the list comprehension work? – cs95 Jun 15 '19 at 02:45
  • It did, and I think your solution would serve to help others with this problem if you would let this question live. – vaer-k Jun 15 '19 at 02:46
  • I am obviously far from expert with this tool, but I wouldn't expect many people to be able to find a solution in the question you linked. Furthermore, I don't see any lack of reproducibility in my question as I have catered to your every demand. – vaer-k Jun 15 '19 at 02:47
  • I'll hand it to you that the nature of your problem makes it hard to actually provide a runnable example. But you've at least provided the traceback which makes it more obvious why the error is happening. Thanks for that! I've re-opened your question. – cs95 Jun 15 '19 at 02:49

1 Answers1

1

From the traceback, it is clear that the error is being raised in the distance function that is_proximal is calling internally. This leads me to believe you're passing Series objects when the function is meant to be working with scalar data.

See the discussion in Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() where the use of python logical operators on pandas Series causes the same error.

In your case, the solution is to iterate over your data, and pass each group of co-ordinates to your function one at a time.

df['proximal'] = [
    is_proximal((a, b), (c, d)) 
    for a, b, c, d in df[['lat_x', 'long_x', 'lat_y', 'long_y']].values
]
cs95
  • 379,657
  • 97
  • 704
  • 746