I've searched quite a bit and I can't seem to find something along the line of pivot functionality for my particular problem. I'll convey a simple example of what I'm looking for:
Long Table
dependent_variable step a b
5.5 1 20 30
5.5 2 25 37
6.1 1 22 19
6.1 2 18 29
Desired Wide Table
dependent_variable a_step1 a_step2 b_step1 b_step2
5.5 20 25 30 37
6.1 22 18 19 29
Effectively I would like to pivot on the Step column, and to make the column name for the rest of the independent variables (in this case a and b) include the step number and the a/b value associated with it.
Once pivoted, then I will use the dependent variable column and as a numpy array and the newly pivoted dependent variables to feed into various machine learning algorithms.
When I attempted piRSquared's suggestion (thank you) I got the error: Index contains duplicate entries, cannot reshape.
I then tried (from Here)
d1 =data.set_index(['dependent_variable','step'], append=True).unstack()
d1.columns = d1.columns.map(lambda x: '{}_step{}'.format(*x))
d1.reset_index(inplace=True)
And (using the example table) got the following:
level_0 dependent_variable a_step1 a_step2 b_step1 b_step2
1 5.5 20 NaN 30 NaN
2 5.5 NaN 25 NaN 37
3 6.1 22 NaN 19 NaN
4 6.1 NaN 18 NaN 29
So, I'm still missing a step