1

I have a dataframe with columns speed and timestamp which is a simple range between 0 and 100. I would like to compute the following integral in a new column distance for each timestamp.

What I did :

import numpy as np
import pandas as pd

#some code ... 

dataframe.loc[:, "distance"] = [
     np.trapz(
          y=dataframe["speed"].iloc[:t], 
          x=dataframe["timestamp"].iloc[:t],
     )
     for t in range(0, dataframe.shape[0])
]

However, I do suspect there is more pythonic and efficient way to compute this integral, which better use the power and syntax of pandas, and so on.

Does someone have an idea ?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Mistapopo
  • 433
  • 3
  • 16

1 Answers1

1

Probably you are looking for

dataframe["distance"] = dataframe["speed"].cumsum() * freq

where freq is the frequency of your time series (for example freq=10 if you have 10 records per second).

Simone
  • 695
  • 4
  • 6
  • It will compute the integral using the rectangle method. But that's a good point to start. – Mistapopo Nov 19 '21 at 14:26
  • If I'm not mistaken, the trapezoidal method should be given by `dataframe["distance"] = (dataframe.iloc[:-1, "speed"].cumsum() + dataframe.iloc[1:, "speed"].cumsum()) / 2 * freq`. Do you agree? – Simone Nov 19 '21 at 14:38
  • I accept your answer. But, I would talk about `width` instead of `freq`. Typically the `width` is computed using : `width = (dataframe["timestamp"] - dataframe["timestamp"].min()) / (freq * dataframe["timestamp"] + 1)`. Where `freq` is the frequency of the `timestamp`, i.e. `freq = dataframe["timestamp"].size / (dataframe["timestamp"].max() - dataframe["timestamp"].min())`. – Mistapopo Nov 20 '21 at 17:15