2

First of all, apologies if this has been solved somewhere. I have gone through these posts a, b and they do not seem to address my problem.

For several objects, I want to calculate parameter differences in time, and sum them up.

synthetic data:

import pandas as pd
import numpy as np

np.random.seed(23)

dict_1 = {
    'time': range(100), 
    'object': [1]*100, 
    'parameter_1':np.random.randint(-10,10,100),
    'parameter_2':np.random.randint(-10,10,100)}

dict_2 = {
    'time': range(100), 
    'object': [2]*100, 
    'parameter_1':np.random.randint(-120,-110,100),
    'parameter_2':np.random.randint(-10,10,100)}

df1 = pd.DataFrame(dict_1)
df2 = pd.DataFrame(dict_2)

df = df1.append(df2)

approach:

using groupby, agg, and diff:

for df1:

diffs = df1.groupby(['object', 'time']).agg({'parameter_1': np.mean, 'parameter_2': np.mean}).diff()
diffs['total_variation'] = abs(diffs['parameter_1']) + abs(diffs['parameter_2'])
np.max(diffs['total_variation'])

output: 35

for df2:

diffs = df2.groupby(['object', 'time']).agg({'parameter_1': np.mean, 'parameter_2': np.mean}).diff()
diffs['total_variation'] = abs(diffs['parameter_1']) + abs(diffs['parameter_2'])
np.max(diffs['total_variation'])

output: 24

for the combined data:

diffs = df.groupby(['object', 'time']).agg({'parameter_1': np.mean, 'parameter_2': np.mean}).diff()
diffs['total_variation'] = abs(diffs['parameter_1']) + abs(diffs['parameter_2'])
np.max(diffs['total_variation'])

expected result:

35

actual result:

129

So, it seems that diff() ignores the grouping. I could write a solution for this that involves for-loops, but since my data is on the large side, I would rather avoid that. Is there a way to fix this, or am I misusing the functions?

Help is greatly appreciated!

warped
  • 8,947
  • 3
  • 22
  • 49
  • when using a grouped diff I use [`np.diff()`](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.diff.html) if i do not want a transformed output, rather a grouped difference. :) – anky Apr 30 '19 at 12:34
  • looks like expected behaviour tbh – avloss Apr 30 '19 at 12:38

0 Answers0