1

I have a Pandas DataFrame with monthly data points about the climate for about 30 years, and am trying to reformat the DataFrame into a format that another model that will use the data can accept. I am wondering if there is a simple way to reformat a time series data all within a single column into several columns that each represent a month. I'll give example formats below for clarity:

Current Format:

Date HDD CDD
2010-01-01 3000 0
2010-02-01 2500 500
2010-03-01 2000 1000
2010-04-01 1500 1500
2010-05-01 1000 2000
2010-06-01 500 2500
2010-07-01 0 3000
2010-08-01 500 2500
2010-09-01 1000 2000
2010-10-01 1500 1500
2010-11-01 2000 1000
2010-12-01 2500 500
2011-01-01 3000 0
2011-02-01 2500 500
2011-02-01 2000 1000
... ... ...

Desired Format:

Date Jan-HDD Jan-CDD Feb-HDD Feb-CDD Mar-HDD Mar-CDD ...
2010 3000 0 2500 500 2000 1000 ...
2011 3000 0 1500 500 2000 1000 ...
... ... ... ... ... ... ... ...

Let me know if you have any suggestions, thank you!

Below is a code snippet of what I've tried, which leads to errors or a blank DataFrame depending on the parameters passed to the merge function. The code below is just a test, but I was intending to put it in a for loop across each month if it had worked.

month = 'January'
ddf = dd[dd['Date'].dt.month_name() == month].rename(columns={'HDD': '{}-HDD'.format(month), 'CDD': '{}-CDD'.format(month)})

month = 'February'
ddf_temp = dd[dd['Date'].dt.month_name() == month]
ddf_temp = ddf_temp.rename(columns={'HDD': '{}-HDD'.format(month), 'CDD': '{}-CDD'.format(month)})
##ddf = ddf.merge(ddf_temp, on='Date', how='outer')
#ddf

pd.merge(ddf, ddf_temp, on='Date')

1 Answers1

1

I'm assuming that the last row in your current format table is a typo and should be 2011-03-01.

import pandas as pd

df = pd.DataFrame(
    [
        ("2010-01-01", "3000", "0"),
        ("2010-02-01", "2500", "500"),
        ("2010-03-01", "2000", "1000"),
        ("2010-04-01", "1500", "1500"),
        ("2010-05-01", "1000", "2000"),
        ("2010-06-01", "500", "2500"),
        ("2010-07-01", "0", "3000"),
        ("2010-08-01", "500", "2500"),
        ("2010-09-01", "1000", "2000"),
        ("2010-10-01", "1500", "1500"),
        ("2010-11-01", "2000", "1000"),
        ("2010-12-01", "2500", "500"),
        ("2011-01-01", "3000", "0"),
        ("2011-02-01", "2500", "500"),
        ("2011-03-01", "2000", "1000"),
    ],
    columns=["Date", "HDD", "CDD"],
)
df["Date"] = pd.to_datetime(df["Date"])
df = pd.pivot_table(df, values=["HDD", "CDD"], index=df["Date"].dt.year, columns=df["Date"].dt.month)
print(df)

Output:

      CDD                         HDD
Date   1      2       3            1       2       3
2010  0.0  500.0  1000.0  ...  3000.0  2500.0  2000.0  ...
2011  0.0  500.0  1000.0  ...  3000.0  2500.0  2000.0  ...
Student
  • 522
  • 1
  • 6
  • 18