Python pandas: rearranging data from 'dummy' date columns to rows

Question

I am trying to run a ML model but my independent variables are differently structured than my dependent variable.

The independent variables are structured like this:

id . month/year . var_a . var_b
0 .  01/2016 .    1 .     2 
0 .  02/2016 .    2 .     1 
1 .  01/2016 .    2 .     3

So basically, ids are not unique but come in unique pairs with the month/year column.

the dependent variable looks like this:

id . 01/2016 . 02/2016 . .... 
0    0 .       1 .  
1 .  1 .       0  
2 .  0 .       0

So this dataframe has a column for every month and a 0 or 1 representing my a yes or no for my classification. So ideally I would like the dependent table to be shaped like the independent dataframe:

Desired output of the dependent variable:

id . month/year . y
0 .  01/2016 .    0
0 .  02/2016 .    1
1 .  01/2016 .    1
1 .  02/2016 .    0
2 .  01/2016 .    0
2 .  02/2016 .    0

I can't wrap my head around it how to do this.

Thank you in advance.

I think need `melt`, check dupe answer. `df = df.melt('id')` should working nice. — jezrael, Mar 27 '18 at 10:50

score 0 · Answer 1 · answered Mar 27 '18 at 09:10

0

Maybe try pivot_table:

df_pivot = pd.pivot_table(df,index=['id'],columns=['month/year'])

giving you

       var_a           var_b
date 01/2016 02/2016 01/2016 02/2016
id
0        1.0     2.0     2.0     1.0
1        2.0     NaN     3.0     NaN

and then if you want to flatten the multi-index:

df_pivot.columns = [' '.join(col).strip() for col in df_pivot.columns.values]

which gives you:

    var_a 01/2016  var_a 02/2016  var_b 01/2016  var_b 02/2016
id
0             1.0            2.0            2.0            1.0
1             2.0            NaN            3.0            NaN

answered Mar 27 '18 at 09:10

Dan

45,079
17
88
157

That could indeed work. But I would prefer it the other way around (see my edits providing a desired output). But thank you a lot for this. – business_of_ferrets Mar 27 '18 at 10:46
@joeytang202 use pandas' `melt` function then – Dan Mar 27 '18 at 10:59

Python pandas: rearranging data from 'dummy' date columns to rows

1 Answers1