26

I have a pandas Dataframe with N columns representing the coordinates of a vector (for example X, Y, Z, but could be more than 3D).

I would like to aggregate the dataframe along the rows with an arbitrary function that combines the columns, for example the norm: (X^2 + Y^2 + Y^2).

I want to do something similar to what is done here and here and here but I want to keep it general enough that the number of columns can change and it behaves like

DataFrame.mean(axis = 1)

or

DataFrame.sum(axis = 1)
Community
  • 1
  • 1
Fra
  • 4,918
  • 7
  • 33
  • 50

5 Answers5

32

I found a faster solution than what @elyase suggested:

np.sqrt(np.square(df).sum(axis=1))
Fra
  • 4,918
  • 7
  • 33
  • 50
  • there is also np.linalg.norm, but for some reason the "manual version" you supplied above is faster – Wizard Jan 10 '16 at 15:58
  • 1
    at least in my case, this could be speeded up by doing df.values – 00__00__00 Nov 24 '17 at 09:08
  • @Wizard The reason why the "manual version" is faster than `np.linalg.norm()` I've discussed in [this SO post](https://stackoverflow.com/questions/64948677/). Note that if views are involved or `df` has a lot of columns, `np.linalg.norm()` eventually wins. – normanius Apr 19 '21 at 02:12
13

Numpy provides norm... Use:

np.linalg.norm(df[['X','Y','Z']].values,axis=1)
ntg
  • 12,950
  • 7
  • 74
  • 95
9

One line, using whatever function you desire (including lambda functions), e.g.

df.apply(np.linalg.norm, axis=1)

or

df.apply(lambda x: (x**2).sum()**.5, axis=1)

PeterFoster
  • 319
  • 1
  • 3
  • 9
3

filter the columns by name

cols = ['X','Y','Z']
df[cols].mean(axis=1)
df[cols].sum(axis=1)
df[cols].apply(lambda values: sum([v**2 for v in values]), axis=1)
mattexx
  • 6,456
  • 3
  • 36
  • 47
2

You are looking for apply. Your example would look like this:

>> df = pd.DataFrame([[1, 1, 0], [1, 0, 0]], columns=['X', 'Y', 'Z'])
     X   Y   Z
0    1   1   0
1    1   0   0

>>> df.apply(lambda x: np.sqrt(x.dot(x)), axis=1)
0    1.414214
1    1.000000
dtype: float64

This works for any number of dimensions.

elyase
  • 39,479
  • 12
  • 112
  • 119
  • 1
    Thanks! I just stumbled upon a faster solution: `np.sqrt(np.square(df).sum(axis=1))` – Fra Feb 05 '14 at 22:11
  • Always prefer column-wise functions to `apply` - for common operations, the former are orders of magnitude faster than a hand-written apply. – Axel May 31 '18 at 07:02