0

I am trying to run a ML model but my independent variables are differently structured than my dependent variable.

The independent variables are structured like this:

id . month/year . var_a . var_b
0 .  01/2016 .    1 .     2 
0 .  02/2016 .    2 .     1 
1 .  01/2016 .    2 .     3 

So basically, ids are not unique but come in unique pairs with the month/year column.

the dependent variable looks like this:

id . 01/2016 . 02/2016 . .... 
0    0 .       1 .  
1 .  1 .       0  
2 .  0 .       0 

So this dataframe has a column for every month and a 0 or 1 representing my a yes or no for my classification. So ideally I would like the dependent table to be shaped like the independent dataframe:

Desired output of the dependent variable:

id . month/year . y
0 .  01/2016 .    0
0 .  02/2016 .    1
1 .  01/2016 .    1
1 .  02/2016 .    0
2 .  01/2016 .    0
2 .  02/2016 .    0

I can't wrap my head around it how to do this.

Thank you in advance.

1 Answers1

0

Maybe try pivot_table:

df_pivot = pd.pivot_table(df,index=['id'],columns=['month/year'])

giving you

       var_a           var_b
date 01/2016 02/2016 01/2016 02/2016
id
0        1.0     2.0     2.0     1.0
1        2.0     NaN     3.0     NaN

and then if you want to flatten the multi-index:

df_pivot.columns = [' '.join(col).strip() for col in df_pivot.columns.values]

which gives you:

    var_a 01/2016  var_a 02/2016  var_b 01/2016  var_b 02/2016
id
0             1.0            2.0            2.0            1.0
1             2.0            NaN            3.0            NaN
Dan
  • 45,079
  • 17
  • 88
  • 157