Many have referred me to this solution Pandas Merging 101. However, unlike the issue raised by the Pandas Merging 101, I am handling a single dataframe that needs to convert a specific column (Variable in my case) as a header for the Values Columns, while at the same time maintaining the Class column.
The dataframe has a Values
column, a Variable
column consisting of repetitive: var1,var2,var3
and a Class
column of the repetitive classes:1,2,3,4,5
as shown below:
Values Variable Class
0 1.274143 var1 1
1 0.120940 var2 1
2 2.031863 var3 1
3 -0.437285 var1 2
4 -0.530028 var2 2
5 1.057256 var3 2
6 2.540227 var1 3
7 4.169430 var2 3
8 3.561490 var3 3
9 -3.153675 var1 4
10 -4.907512 var2 4
11 0.419970 var3 4
12 1.788764 var1 5
13 0.901584 var2 5
14 0.823103 var3 5
15 1.740886 var1 1
16 3.215278 var2 1
17 -1.235741 var3 1
18 2.167600 var1 2
19 0.322950 var2 2
20 2.086752 var3 2
21 -0.708104 var1 3
22 . . .
23 . . .
24 . . .
I want the output to have the structure as follows:
var1 var2 var3 Class
1.274143 0.120940 2.031863 1
1.740886 3.215278 -1.235741 1
value value value 1
value value value 1
value value value 1
. . . .
. . . .
. . . .
-0.437285 -0.530028 1.057256 2
2.167600 0.322950 2.086752 2
value value value 2
value value value 2
value value value 2
. . . .
. . . .
. . . .
value value value 3
value value value 3
value value value 3
. . . .
. . . .
. . . .
I tried to use pivot:
print(df.pivot(index = 'Class',columns ='Variable', values =['Values']))
#Note that I want the Class to be a column not as an index
but I got an error result:
ValueError: Index contains duplicate entries, cannot reshape
I also tried pivot table:
table = pd.pivot_table(df, values='Values', columns=['Variable'],index = ['Class'],aggfunc=None)
but my result is:
TypeError: Must provide 'func' or tuples of '(column, aggfunc).
EDIT:
This is a brute forced solution to my problem:
partial_result = pd.DataFrame(df,columns = ['Values','Variable','Class'])
column_headers = partial_result.Variable.unique().tolist()
Len_Var = int(len(column_headers))
result = pd.DataFrame(np.reshape(partial_result['Values'].values,(int(len(partial_result['Values'].values)/Len_Var),Len_Var)),columns = column_headers)
labels= pd.DataFrame(np.reshape(partial_result['Class'].values,(int(len(partial_result['Class'].values)/Len_Var),Len_Var)))
result['Class'] = labels[0]
print(result)
The code works but if you have a more elegant solution, would be great to look into it.