I'm trying to build a dataframe that will be used for linear regression. I would like to include 11 independent "dummy" variables that are set to either 1 or 0 based on the month of the year. Without getting too far off topic, I'm using 11 variables instead of 12, as the 12th month is captured by the intercept.
I know many things can be done with pandas without looping through the entire dataframe, and doing things in that manner are typically faster than using a loop.
So, is it possible to grab the month from my date column, and dynamically set a seperate column to either a 1 or a 0 based on that month? Or am I asking a stupid question?
Edit: I should have included more information. A dataframe is structured like this:
Date | sku | units ordered | sessions | conversion rate |
---|---|---|---|---|
2020/01/30 | abc123 | 20 | 200 | 0.1 |
2020/01/31 | abc123 | 10 | 100 | 0.1 |
2020/02/01 | abc123 | 15 | 60 | 0.25 |
I would like to make it look like this:
Date | sku | units ordered | sessions | conversion rate | january | february |
---|---|---|---|---|---|---|
2020/01/30 | abc123 | 20 | 200 | 0.1 | 1 | 0 |
2020/01/31 | abc123 | 10 | 100 | 0.1 | 1 | 0 |
2020/02/01 | abc123 | 15 | 60 | 0.25 | 0 | 1 |
The code I'm currently using to accomplish this is:
x = 1
while x < 12:
month = calendar.month_name[x]
df[month] = 0
x += 1
for index, row in df.iterrows():
d = row[0]
month = d.strftime("%B")
if not month == "December":
df.at[index, month] = 1
df.fillna(0, inplace=True)
Just not sure if this is the best way to accomplish this.