0

Well, I have following columns:

Id PlayId  X     Y
0  0       2.3   3.4
1  0       5.4   3.2

2  1       3.2   5.1
3  1       4.2   1.7

If I have two rows groupped by one PlayId, I want to add two columns of Distance and Angle:

Id PlayId  X     Y   Distance_0  Distance_1 Angle_0 Angle_1
0  0       2.3   3.4 0.0         ?          0.0     ?
1  0       5.4   3.2 ?           0.0        ?       0.0

2  1       3.2   5.1
3  1       4.2   1.7

Every Distance-column describes Euclidean distance between i-th and j-th element in a group:

dist(x0, x1, y0, y1) = sqrt((x0 - x1) ** 2 + (y0 - y1) ** 2)

Similar way, the angle between i-th and j-th element is calculated.

So, how can I perform this efficiently, without processing elements one-by-one?

  • hm.. what is x0 and x1 in your example? – Alex Oct 22 '19 at 11:29
  • Appropriate coordinates of elements: x0 for i-th element and x1 for j-th element. –  Oct 22 '19 at 11:33
  • I don't quite get the grouping. Could you give an example for values in `Distance_0` and `Distance_1`? Which values of X and Y are used to compute these distances? – buboh Oct 22 '19 at 12:08
  • I just mean that `Distance_{i}` means the distance between i-th element and current one. For the 1st element, for example, `Distance_0 = dist(x0, y0, x0, y0) = 0.0`, `Distance_1 = dist(x0, y0, x1, y1)`. For the 2nd element, `Distance_0 = dist(x0, y0, x1, y1) `, `Distance_1 = dist(x1, y1, x1, y1) = 0.0` –  Oct 22 '19 at 12:24
  • Actually, we get a distance matrix with zeros in the main diagonal. An angle matrix is constructured the same way. –  Oct 22 '19 at 12:38
  • Could this be of help: https://www.drawingfromdata.com/making-a-pairwise-distance-matrix-with-pandas? It uses the `pdist` function from SciPy to compute the "pairwise distances between observations in n-dimensional space". – buboh Oct 22 '19 at 12:41
  • It's similar to what I need... but how to deal with two coordinates instead of one, that is used above? –  Oct 22 '19 at 12:59

1 Answers1

0

You can compute the pairwise distances by using the pdist function from SciPy:

df = pd.DataFrame({'X': [5, 6, 7], 'Y': [3, 4, 5]})

# df
#    X  Y
# 0  5  3
# 1  6  4
# 2  7  5

from scipy.spatial.distance import pdist, squareform

cols = [f'Distance_{i}' for i in range(len(df))]
pd.DataFrame(squareform(pdist(df.values)), columns=cols)

which produces the following DataFrame:

Distance_0  Distance_1  Distance_2
0   0.000000    1.638991    2.828427
1   1.638991    0.000000    1.638991
2   2.828427    1.638991    0.000000

This works, since pdist takes an array of size m * n, where m is the number of observations (=rows) and n the dimension of said observations (in this case: two - X and Y)

You could subsequently concat the original DataFrame with the newly created one if needed (using pd.concat).

For the angle, you could use pdist as well, using metric='cosine' to compute the cosine distance. See this post for more information.

buboh
  • 897
  • 7
  • 10