1

How can I use pyspark or pandas to achieve the below transformation? Thanks a lot. Source File is csv with following info:

enter image description here

expected:

enter image description here

KarthikNayak98
  • 363
  • 3
  • 13

1 Answers1

1

I assume that your source CSV file is as follows:

 ,Group_1,Group_2
 ,Sub_Group_1,Sub_Group_1
Maturity,7/31/20,7/31/20
0.1,0.2,0.3
0.2,0.3,0.4

Note that:

  • there are initial spaces in first 2 rows (names of 2 top MultiIndex levels for the first column), otherwise read_csv would give them Unnamed... default names,
  • there are actually 3 MultiIndex levels on columns (the third level contains dates).

I read them calling df = pd.read_csv('Input.csv', header=[0,1,2]) and the content is:

               Group_1     Group_2
           Sub_Group_1 Sub_Group_1
  Maturity     7/31/20     7/31/20
0      0.1         0.2         0.3
1      0.2         0.3         0.4

(you should have included it instead of a picture).

The first step is melt:

result = pd.melt(df, id_vars=[(' ', ' ', 'Maturity')],
    var_name=['Group_Name', 'Sub_Group_Name', 'Date'])

Note that in the above code I specified id_vars as a tuple, with 2 top elements as spaces (this is why I made the assumption about initial spaces).

The result is:

   ( ,  , Maturity) Group_Name Sub_Group_Name     Date  value
0               0.1    Group_1    Sub_Group_1  7/31/20    0.2
1               0.2    Group_1    Sub_Group_1  7/31/20    0.3
2               0.1    Group_2    Sub_Group_1  7/31/20    0.3
3               0.2    Group_2    Sub_Group_1  7/31/20    0.4

And the only thing to do is to rename the first column:

result.rename(columns={result.columns[0]: 'Maturity'}, inplace=True)

Now, when you print the result, you will get:

   Maturity Group_Name Sub_Group_Name     Date  value
0       0.1    Group_1    Sub_Group_1  7/31/20    0.2
1       0.2    Group_1    Sub_Group_1  7/31/20    0.3
2       0.1    Group_2    Sub_Group_1  7/31/20    0.3
3       0.2    Group_2    Sub_Group_1  7/31/20    0.4

The row ordering is a bit different, but this is the way melt works. If you are unhappy about this detail, reorder rows according to your needs.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41