2

I am retrieving data from a SQL database using django, and then convert it to a pandas dataframe. The goal is to calculate some metrics for each person well as metrics overall across all persons.

After calculating my metrics, I want to compare the values to the previous month and calculate the changes as percentage.

This works roughly like this:

member_frame = member_frame.truncate(before=(previous_start - timedelta(days=1)))

which gives me all entries from the last month up to including the end of this month.

I then calculate the sum of some values:

aggregate_avgs = member_frame.groupby(member_frame.index.month).mean().replace(np.inf, 1).fillna(0)

And finally calculate the percentage changes:

avgs_change = aggregate_avgs.pct_change().replace(np.inf, 1).fillna(-1)

Which, if we look at the changes from january to february, like that:

target_date
1           -1.000000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
2            0.142857  0.076923 -0.828571            0.0 -0.587774 -0.699994  -0.507199

This works... fine? Since I am only interested in the changes from the first to the second date, I simply take the values from the second row.

But if I do the same for december and january, I get this:

target_date
1           -1.00000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
12           0.02682  0.273444  0.449811            0.0  0.259059  0.424178   0.223225

As you can see, this is somewhat reversed, and the changes are actually for the wrong month. The target_date index is a datetime index.

I suspect that the same will happen if I do that with weekly changes, when crossing years.

What am I doing wrong? I am quite new to pandas, so it might as well be a very easy mistake.

Hafnernuss
  • 2,659
  • 2
  • 29
  • 41

2 Answers2

2

IIUC, you might want to use resample instead, which is akin to grouping by month and year:

aggregate_avgs = member_frame.resample('m').mean().replace(np.inf, 1).fillna(0)
fsl
  • 3,250
  • 1
  • 10
  • 20
1

I think you could write aggregate_avgs = member_frame.groupby(pd.Grouper(freq="M")).mean() as is described here.

vonludi
  • 419
  • 2
  • 20