0

I have a dataFrame (python) like this:

x   y   z   time
0   0.730110    4.091428    7.833503    1618237788537
1   0.691825    4.024428    7.998608    1618237788537
2   0.658325    3.998107    8.195119    1618237788537
3   0.658325    4.002893    8.408080    1618237788537
4   0.677468    4.017250    8.561220    1618237788537

I want to add column to this dataFrame called computed. This column includes values computed as for: row 0: (0.730110-0)^2 +(4.091428-0)^2 +(7.833503-0)^2 row 1: (0.691825 -0.730110)^2 +(4.024428- 4.091428)^2 +(7.998608-7.833503)^2 etc

How can do that please.

Romero_91
  • 405
  • 3
  • 15
bib
  • 944
  • 3
  • 15
  • 32

1 Answers1

2

TL;DR:

df['computed'] = df.diff().pow(2).sum(axis=1)
df.at[0, 'computed'] = df.loc[0].pow(2).sum()

Step by step:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6], 'b': [1, 1, 2, 3, 5, 8], 'c': [1, 4, 9, 16, 25, 36]})

df
#    a  b   c
# 0  1  1   1
# 1  2  1   4
# 2  3  2   9
# 3  4  3  16
# 4  5  5  25
# 5  6  8  36

df.diff()
#      a    b     c
# 0  NaN  NaN   NaN
# 1  1.0  0.0   3.0
# 2  1.0  1.0   5.0
# 3  1.0  1.0   7.0
# 4  1.0  2.0   9.0
# 5  1.0  3.0  11.0

df.diff().pow(2)
#      a    b      c
# 0  NaN  NaN    NaN
# 1  1.0  0.0    9.0
# 2  1.0  1.0   25.0
# 3  1.0  1.0   49.0
# 4  1.0  4.0   81.0
# 5  1.0  9.0  121.0

df.diff().pow(2).sum(axis=1)
# 0      0.0
# 1     10.0
# 2     27.0
# 3     51.0
# 4     86.0
# 5    131.0

df['computed'] = df.diff().pow(2).sum(axis=1)

df
#    a  b   c  computed
# 0  1  1   1       0.0
# 1  2  1   4      10.0
# 2  3  2   9      27.0
# 3  4  3  16      51.0
# 4  5  5  25      86.0
# 5  6  8  36     131.0

df.at[0, 'computed'] = df.loc[0].pow(2).sum()

df
#    a  b   c  computed
# 0  1  1   1       3.0
# 1  2  1   4      10.0
# 2  3  2   9      27.0
# 3  4  3  16      51.0
# 4  5  5  25      86.0
# 5  6  8  36     131.0

Relevant documentation and related questions:

Stef
  • 13,242
  • 2
  • 17
  • 28
  • The first computed value in row 0 must be 3 – bib Oct 19 '21 at 15:21
  • @bib Yes. Because `.diff` doesn't treat the first row the way you want. You can do the first row separately. I edited my answer. – Stef Oct 19 '21 at 15:53
  • can we applied the same thing for a subset of columns instead to use all columns let say for only a and b? – bib Oct 19 '21 at 19:53
  • @bib Yes. See this related question: [Pandas: sum DataFrame rows for given columns](https://stackoverflow.com/questions/25748683/pandas-sum-dataframe-rows-for-given-columns). Just index by the list of column names before performing the operations: `df['computed'] = df[['a', 'b']].diff().pow(2).sum(axis=1)` – Stef Oct 20 '21 at 11:44
  • @bib And for the first row you can either do it the same way, or write it explicitly: `df.at[0, 'computed'] = df.at[0, 'a']**2 + df.at[0, 'b']**2` – Stef Oct 20 '21 at 11:47