1

I've got this function from a separate question: How to apply euclidean distance function to a groupby object in pandas dataframe?. The following function measures the distance between objects grouped by time and ids. The issue I'm having is when there isn't enough coordinates. There's a run time error

RuntimeWarning: Mean of empty slice.(np.array(list(zip(x['x'], x['y']))))

I'm hoping to pass 0 when this occurs.

import pandas as pd
from scipy import spatial
import numpy as np

time = [0, 0, 0, 0, 1, 1, 1]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56]
car = [1, 2, 3, 1, 3, 4, 5]
ids = ['a', 'b', 'a', 'a', 'b', 'b', 'a']
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car, 'ids': ids})

df = (df.groupby(['time','ids'])
        .apply(lambda x: spatial.distance.pdist
        (np.array(list(zip(x['x'], x['y']))))
        .mean())
        .reset_index()
        )

Intended Output:

   time ids           0
0     0   a   78.042816
1     0   b           0
2     1   a           0
3     1   b  191.927069
Grebtsew
  • 192
  • 5
  • 13
jonboy
  • 415
  • 4
  • 14
  • 45

1 Answers1

1

Adding a condition in the lambda function is one trick:

(df.groupby(['time','ids'])
   .apply(lambda x: spatial.distance.pdist
       (np.array(list(zip(x['x'], x['y'])))).mean() if len(x)>1 else 0)
   .reset_index()
)

   time ids           0
0     0   a   78.042816
1     0   b    0.000000
2     1   a    0.000000
3     1   b  191.927069

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37