How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:
expected:
How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:
expected:
I assume that your source CSV file is as follows:
,Group_1,Group_2
,Sub_Group_1,Sub_Group_1
Maturity,7/31/20,7/31/20
0.1,0.2,0.3
0.2,0.3,0.4
Note that:
I read them calling df = pd.read_csv('Input.csv', header=[0,1,2])
and the
content is:
Group_1 Group_2
Sub_Group_1 Sub_Group_1
Maturity 7/31/20 7/31/20
0 0.1 0.2 0.3
1 0.2 0.3 0.4
(you should have included it instead of a picture).
The first step is melt:
result = pd.melt(df, id_vars=[(' ', ' ', 'Maturity')],
var_name=['Group_Name', 'Sub_Group_Name', 'Date'])
Note that in the above code I specified id_vars as a tuple, with 2 top elements as spaces (this is why I made the assumption about initial spaces).
The result is:
( , , Maturity) Group_Name Sub_Group_Name Date value
0 0.1 Group_1 Sub_Group_1 7/31/20 0.2
1 0.2 Group_1 Sub_Group_1 7/31/20 0.3
2 0.1 Group_2 Sub_Group_1 7/31/20 0.3
3 0.2 Group_2 Sub_Group_1 7/31/20 0.4
And the only thing to do is to rename the first column:
result.rename(columns={result.columns[0]: 'Maturity'}, inplace=True)
Now, when you print the result, you will get:
Maturity Group_Name Sub_Group_Name Date value
0 0.1 Group_1 Sub_Group_1 7/31/20 0.2
1 0.2 Group_1 Sub_Group_1 7/31/20 0.3
2 0.1 Group_2 Sub_Group_1 7/31/20 0.3
3 0.2 Group_2 Sub_Group_1 7/31/20 0.4
The row ordering is a bit different, but this is the way melt works. If you are unhappy about this detail, reorder rows according to your needs.