I noticed a strange behaviour when I update the version of pandas from 0.23.4 to 0.24.2. The following snippet demonstrates this: CSV File: (Filename: my_data_new.csv)
date,name,id,roll,sub_1,sub_2,sub_3
2016-11-30 08:00:00,AAA,A,1,123.456,123.456,123.456
2016-11-30 09:00:00,AAA,A,1,123.457,123.457,123.457
2016-11-30 10:00:00,AAA,A,1,123.458,123.458,123.458
2016-11-30 11:00:00,AAA,A,1,123.459,123.459,123.459
2016-11-30 12:00:00,BBB,B,2,123.451,123.456,123.456
2016-11-30 13:00:00,BBB,B,2,123.452,123.457,123.457
2016-11-30 14:00:00,BBB,B,2,123.453,123.458,123.458
2016-11-30 15:00:00,BBB,B,2,123.454,123.459,123.459
Snippet:
import pandas as pd
print("PANDAS-VERSION:", pd.__version__)
def my_func(d):
d_copy = d.copy(deep=True)
return d_copy
data = pd.read_csv("~/my_data_new.csv", parse_dates=['date'], index_col=['date']).sort_index()
result = data.groupby('name').apply(my_func)
print(result)
Output: In Pandas Version-0.23.4:
PANDAS-VERSION: 0.23.4
name id roll sub_1 sub_2 sub_3
name date
AAA 2016-11-30 08:00:00 AAA A 1 123.456 123.456 123.456
2016-11-30 09:00:00 AAA A 1 123.457 123.457 123.457
2016-11-30 10:00:00 AAA A 1 123.458 123.458 123.458
2016-11-30 11:00:00 AAA A 1 123.459 123.459 123.459
BBB 2016-11-30 12:00:00 BBB B 2 123.451 123.456 123.456
2016-11-30 13:00:00 BBB B 2 123.452 123.457 123.457
2016-11-30 14:00:00 BBB B 2 123.453 123.458 123.458
2016-11-30 15:00:00 BBB B 2 123.454 123.459 123.459
In Pandas Version-0.24.2:
PANDAS-VERSION: 0.24.2
name id roll sub_1 sub_2 sub_3
name date
AAA 2016-11-30 12:00:00 AAA A 1 123.456 123.456 123.456
2016-11-30 13:00:00 AAA A 1 123.457 123.457 123.457
2016-11-30 14:00:00 AAA A 1 123.458 123.458 123.458
2016-11-30 15:00:00 AAA A 1 123.459 123.459 123.459
BBB 2016-11-30 12:00:00 BBB B 2 123.451 123.456 123.456
2016-11-30 13:00:00 BBB B 2 123.452 123.457 123.457
2016-11-30 14:00:00 BBB B 2 123.453 123.458 123.458
2016-11-30 15:00:00 BBB B 2 123.454 123.459 123.459
My Observations is as follows:
In pandas-v0.24.2, the index of the last group df (in current case 'BBB
') is being applied to all the previous group dfs (in current case 'AAA
'), while in pandas-0.23.4, the previous indexes are preserved.
Is it a documented behaviour? If so, kindly point me to that modification in Release Notes/code in repo.