I have a Pandas DataFrame with monthly data points about the climate for about 30 years, and am trying to reformat the DataFrame into a format that another model that will use the data can accept. I am wondering if there is a simple way to reformat a time series data all within a single column into several columns that each represent a month. I'll give example formats below for clarity:
Current Format:
Date | HDD | CDD |
---|---|---|
2010-01-01 | 3000 | 0 |
2010-02-01 | 2500 | 500 |
2010-03-01 | 2000 | 1000 |
2010-04-01 | 1500 | 1500 |
2010-05-01 | 1000 | 2000 |
2010-06-01 | 500 | 2500 |
2010-07-01 | 0 | 3000 |
2010-08-01 | 500 | 2500 |
2010-09-01 | 1000 | 2000 |
2010-10-01 | 1500 | 1500 |
2010-11-01 | 2000 | 1000 |
2010-12-01 | 2500 | 500 |
2011-01-01 | 3000 | 0 |
2011-02-01 | 2500 | 500 |
2011-02-01 | 2000 | 1000 |
... | ... | ... |
Desired Format:
Date | Jan-HDD | Jan-CDD | Feb-HDD | Feb-CDD | Mar-HDD | Mar-CDD | ... |
---|---|---|---|---|---|---|---|
2010 | 3000 | 0 | 2500 | 500 | 2000 | 1000 | ... |
2011 | 3000 | 0 | 1500 | 500 | 2000 | 1000 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
Let me know if you have any suggestions, thank you!
Below is a code snippet of what I've tried, which leads to errors or a blank DataFrame depending on the parameters passed to the merge function. The code below is just a test, but I was intending to put it in a for loop across each month if it had worked.
month = 'January'
ddf = dd[dd['Date'].dt.month_name() == month].rename(columns={'HDD': '{}-HDD'.format(month), 'CDD': '{}-CDD'.format(month)})
month = 'February'
ddf_temp = dd[dd['Date'].dt.month_name() == month]
ddf_temp = ddf_temp.rename(columns={'HDD': '{}-HDD'.format(month), 'CDD': '{}-CDD'.format(month)})
##ddf = ddf.merge(ddf_temp, on='Date', how='outer')
#ddf
pd.merge(ddf, ddf_temp, on='Date')