I am retrieving data from a SQL database using django, and then convert it to a pandas dataframe. The goal is to calculate some metrics for each person well as metrics overall across all persons.
After calculating my metrics, I want to compare the values to the previous month and calculate the changes as percentage.
This works roughly like this:
member_frame = member_frame.truncate(before=(previous_start - timedelta(days=1)))
which gives me all entries from the last month up to including the end of this month.
I then calculate the sum of some values:
aggregate_avgs = member_frame.groupby(member_frame.index.month).mean().replace(np.inf, 1).fillna(0)
And finally calculate the percentage changes:
avgs_change = aggregate_avgs.pct_change().replace(np.inf, 1).fillna(-1)
Which, if we look at the changes from january to february, like that:
target_date
1 -1.000000 -1.000000 -1.000000 -1.0 -1.000000 -1.000000 -1.000000
2 0.142857 0.076923 -0.828571 0.0 -0.587774 -0.699994 -0.507199
This works... fine? Since I am only interested in the changes from the first to the second date, I simply take the values from the second row.
But if I do the same for december and january, I get this:
target_date
1 -1.00000 -1.000000 -1.000000 -1.0 -1.000000 -1.000000 -1.000000
12 0.02682 0.273444 0.449811 0.0 0.259059 0.424178 0.223225
As you can see, this is somewhat reversed, and the changes are actually for the wrong month.
The target_date
index is a datetime index.
I suspect that the same will happen if I do that with weekly changes, when crossing years.
What am I doing wrong? I am quite new to pandas, so it might as well be a very easy mistake.