2

I have a dataframe with a bunch of columns labelled in 'YYYY-MM' format, along with several other columns. I need to collapse the date columns into calendar quarters and take the mean; I was able to do it manually, but there are a few hundred date columns in my real data and I'd like to not have to map every single one of them by hand. I'm generating the initial df from a CSV; I didn't see anything in read_csv that seemed like it would help, but if there's anything I can leverage there that would be great. I found dataframe.dt.to_period("Q") that will convert a datetime object to quarter, but I'm not quite sure how to apply that here, if I can at all.

Here's a sample df (code below):

   foo  bar  2016-04  2016-05  2016-06  2016-07  2016-08
0    6    5        3        3        5        8        1
1    9    3        6        9        9        7        8
2    8    5        8        1        9        9        4
3    5    8        1        2        3        5        6
4    4    5        1        2        7        2        6

This code will do what I'm looking for, but I had to generate mapping by hand:

mapping = {'2016-04':'2016q2', '2016-05':'2016q2', '2016-06':'2016q2', '2016-07':'2016q3', '2016-08':'2016q3'}
df = df.set_index(['foo', 'bar']).groupby(mapping, axis=1).mean().reset_index()

New df:

    foo bar 2016q2  2016q3
0   6   5   3.666667    4.5
1   9   3   8.000000    7.5
2   8   5   6.000000    6.5
3   5   8   2.000000    5.5
4   4   5   3.333333    4.0

Code to generate the initial df:

df = pd.DataFrame(np.random.randint(1, 11, size=(5, 7)), columns=('foo', 'bar', '2016-04', '2016-05', '2016-06', '2016-07', '2016-08'))                                                        '2016-07', '2016-08'))
jml
  • 137
  • 1
  • 10
  • Possible duplicate: https://stackoverflow.com/questions/40497199/how-to-convert-monthly-data-to-quarterly-in-pandas And rather than convert the column names by hand, you could try something like `pd.PeriodIndex(pd.to_datetime(df.columns[2:]), freq='Q').map(str)` – Alexander Oct 25 '19 at 20:46

3 Answers3

4

Use a callable that gets applied to the index values. Use axis=1 to apply it to the column values instead.

(df.set_index(['foo', 'bar'])
   .groupby(lambda x: pd.Period(x, 'Q'), axis=1)
   .mean().reset_index())

   foo  bar    2016Q2  2016Q3
0    6    5  3.666667     4.5
1    9    3  8.000000     7.5
2    8    5  6.000000     6.5
3    5    8  2.000000     5.5
4    4    5  3.333333     4.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

The solution is quite short:

Start from copying "monthly" columns to another DataFrame and converting column names to PeriodIndex:

df2 = df.iloc[:, 2:]
df2.columns = pd.PeriodIndex(df2.columns, freq='M')

Then, to get the result, resample columns by quarter, compute the mean (for each quarter) and join with 2 "initial" columns:

df.iloc[:, :2].join(df2.resample('Q', axis=1).agg('mean'))
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0
data = [[2,2,2,3,3,3],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5],[1,2,2,3,4,5]]
df = pd.DataFrame(data, columns = ['A','1996-04','1996-05','2000-07','2000-08','2010-10'])
# separate year columns and other columns
# separate year columns
df3 = df.iloc[:, 1:]
# separate other columns
df2 = df.iloc[:,0]
#apply groupby using period index
df3=df3.groupby(pd.PeriodIndex(df3.columns, freq='Q'), axis=1).mean()
final_df = pd.concat([df3,df2], axis=1)
print(final_df)

output is attached in image:

xpertdev
  • 1,293
  • 2
  • 6
  • 12
  • @jml first separate your year columns and other columns then apply groupby with period index function with only passing year columns after that concat this data frame with your other fields of dataframe. output is below : – xpertdev Oct 25 '19 at 21:48