Extracting and Grouping Sets of Columns in a Pandas DataFrame

Question

I have a DataFrame structure derived from a CSV file on population statistics over a number of years. Namely, the columns in the file are monthly time intervals (1999-01, 1999-02 ... 2016-12) and the rows are different population centers in the world (e.g. London, Toronto, Boston etc.):

df = pd.DataFrame({'1999-01' : [100, 5000, 8000], '1999-02' : [200, 6000, 9000], '1999-03' : [300, 7000, 10000], ..., cities : ['CityA', 'CityB', 'CityC' ...]})

I'd like to segregate these columns on a quarterly basis. So I would take the average populations from 1999-01, 1999-02, 1999-9 for every row, and create a new column '1999Q1' for this entry, doing this with every 3 months:

df_quarter = pd.DataFrame({'1999Q1' : [200, 6000, 9000], '1999Q2' : ..., cities = ['CityA', 'CityB', 'CityC' ...]})

#Q1 corresponds to months 01-03, Q2 to months 04-06, Q3 to months 07-09, Q4 months 10-12, all inclusive

However, I'm having difficulty conceptualizing the query to accomplish this. I have half a mind to use .groupby() and then .agg(), but I'm not sure how I could efficiently specify a 3-columns grouping and iterate across the columns. Could someone please point me in the right direction?

EDIT: Suppose the columns were not dates, but rather something more abstract, and a simple resampling of time periods could not be employed. For example:

#Prices of different foods from different vendors
df = pd.DataFrame({'oranges' : [2, 3, 7], 'apples' : [6, 3, 9], 'cheese' : [13, 9, 11], 'milk' : [6, 5, 12], 'vendors' : ['VendorA', 'VendorB', 'VendorC']})

Now if I wanted to create two columns, combining the fruits and the milk products, is there some way I could specify the indices to aggregate on?

Please read [this](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and learn how to ask a good pandas question. No one is going to produce examples and solutions for you out of thin air. — cs95, Sep 09 '17 at 04:33

score 0 · Accepted Answer · answered Sep 09 '17 at 04:38

0

You can convert columns to_datetime and then to month period with to_period first and then resample by columns (axis=1) and quarter (q) with aggregate mean:

df = pd.DataFrame({'1999-01':[4,5,4,5,5,4],
                   '1999-02':[7,8,9,4,2,3],
                   '1999-03':[1,3,5,7,1,0],
                   '1999-04':[1,3,5,7,1,0],
                   '1999-05':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df)
   1999-01  1999-02  1999-03  1999-04  1999-05
a        4        7        1        1        5
b        5        8        3        3        3
c        4        9        5        5        6
d        5        4        7        7        9
e        5        2        1        1        2
f        4        3        0        0        4

df.columns = pd.to_datetime(df.columns).to_period('m')
df = df.resample('q', axis=1).mean()

print (df)
     1999Q1  1999Q2
a  4.000000     3.0
b  5.333333     3.0
c  6.000000     5.5
d  5.333333     8.0
e  2.666667     1.5
f  2.333333     2.0

answered Sep 09 '17 at 04:38

jezrael

822,522
95
1,334
1,252

1

Please stop encouraging low quality questions. If it was anyone else answering I would have downvoted right away. By answering such questions, you encourage more such questions, that you know will help no one in the future except OP. – cs95 Sep 09 '17 at 04:44
2

@cᴏʟᴅsᴘᴇᴇᴅ - thank you. Hmmm, I agree better is answering if nice input data, desired output, code. It is ideal. But sometimes is possible understand from text what OP need. So it seems I understand it, so I create answer. – jezrael Sep 09 '17 at 04:51
I agree with @cᴏʟᴅsᴘᴇᴇᴅ here. If we still want to go ahead and answer, I think, OP or the answerer should reformat the question/add details for better future usage. – Zero Sep 09 '17 at 05:28
Ah I should have leveraged the datetime functions in pandas better. Thank you. – rw435 Sep 09 '17 at 05:30

Extracting and Grouping Sets of Columns in a Pandas DataFrame

1 Answers1