I have a DataFrame structure derived from a CSV file on population statistics over a number of years. Namely, the columns in the file are monthly time intervals (1999-01, 1999-02 ... 2016-12) and the rows are different population centers in the world (e.g. London, Toronto, Boston etc.):
df = pd.DataFrame({'1999-01' : [100, 5000, 8000], '1999-02' : [200, 6000, 9000], '1999-03' : [300, 7000, 10000], ..., cities : ['CityA', 'CityB', 'CityC' ...]})
I'd like to segregate these columns on a quarterly basis. So I would take the average populations from 1999-01, 1999-02, 1999-9 for every row, and create a new column '1999Q1' for this entry, doing this with every 3 months:
df_quarter = pd.DataFrame({'1999Q1' : [200, 6000, 9000], '1999Q2' : ..., cities = ['CityA', 'CityB', 'CityC' ...]})
#Q1 corresponds to months 01-03, Q2 to months 04-06, Q3 to months 07-09, Q4 months 10-12, all inclusive
However, I'm having difficulty conceptualizing the query to accomplish this. I have half a mind to use .groupby() and then .agg(), but I'm not sure how I could efficiently specify a 3-columns grouping and iterate across the columns. Could someone please point me in the right direction?
EDIT: Suppose the columns were not dates, but rather something more abstract, and a simple resampling of time periods could not be employed. For example:
#Prices of different foods from different vendors
df = pd.DataFrame({'oranges' : [2, 3, 7], 'apples' : [6, 3, 9], 'cheese' : [13, 9, 11], 'milk' : [6, 5, 12], 'vendors' : ['VendorA', 'VendorB', 'VendorC']})
Now if I wanted to create two columns, combining the fruits and the milk products, is there some way I could specify the indices to aggregate on?