2

I need to add field to my DataFrame with calculated distance between Location A and Location B. I have this code which works fine for fields with not empty coordinates:

df['Distance_AB'] = df.apply(lambda x: great_circle((x['latitude_A'],x['longitude_A']), (x['latitude_B'], x['longitude_B'])).meters, axis=1).round()

But when it encounters empty field it throws an error:

ValueError: ('Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates.', u'occurred at index 2881')

How can ensure that formula for great circle distance will not receive NULL value (distance calculation will be skipped when no coordinates are available)? I am aware of pd.notnull() function but it returns True or False.

jpp
  • 159,742
  • 34
  • 281
  • 339
zwornik
  • 329
  • 7
  • 15
  • 1
    Modify your `great_circle` function to be robust to bad inputs, OR use a [ternary conditional operator](https://stackoverflow.com/questions/394809/does-python-have-a-ternary-conditional-operator) within your `lambda` – pault Dec 17 '18 at 21:04

1 Answers1

5

I assume either your function great_circle is not vectorisable or vectorisation is out of scope for your question. Since pd.DataFrame.apply is already a Python-level loop, you can use an explicit function with try / except without significant additional overhead:

def calculator(row):
    lat_A, long_A = row['latitude_A'], row['longitude_A']
    lat_B, long_B = row['latitude_B'], row['longitude_B']
    try:
        return great_circle((lat_A, long_A), (lat_B, long_B)).meters
    except ValueError:
        return np.nan

df['Distance_AB'] = df.apply(calculator, axis=1).round()
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks a lot. try/except worked fine. I only got this warning message but the output looks OK (though hard to validate due to big data size): AppData\Local\Continuum\anaconda2\lib\site-packages\pandas\core\series.py:1828: RuntimeWarning: invalid value encountered in rint result = com._values_from_object(self).round(decimals) – zwornik Dec 17 '18 at 21:34
  • I have stubbornly tried to filter out NULL values inside my apply/lambda. Without much of luck. Would it be possible to achieve, what was done with try/except, with IF statement inside apply/lambda? How one can select DataFrame cell value which is not NULL? Will "if val is not None" work on DataFrame? – zwornik Dec 17 '18 at 21:57
  • Perhaps your series isn't numeric: you can try `df['Distance_AB'] = pd.to_numeric(df.apply(calculator, axis=1), errors='coerce').round()` – jpp Dec 17 '18 at 23:12