Transpose few elements of a row in Pandas dataframe

Question

I have the dataframe of following type-

Year MN DT HR01 HR02 HR03 HR04 PRMEAN PRSMAX PRSMIN
2011 1  1    34   35   67   78     12    235    256
2011 1  2    12   56   78   56     23    256    267
2011 2  1    12   2    32   12     74    221    435

where MN:Months, DT:Date,HR:Hour .Here I want the HR data to be aligned column wise to its date in the following way.

Year MN DT HR PRMEAN PRSMAX PRSMIN
2011 1  1  34     12    235    256
2011 1  1  35     12    235    256
2011 1  1  67     12    235    256
2011 1  1  78     12    235    256
2011 1  2  12     23    256    267
2011 1  2  56     23    256    267
2011 1  2  78     23    256    267
2011 1  2  56     23    256    267
2011 2  1  12     74    221    435
2011 2  1  2     74     221     435
2011 2  1  32     74    221    435
2011 2  1  12     74    221    435

I am really not been able to figure out , how to do it?

You can also add `.sort_values(['PRSMIN'])` to the end of the above expression in order to have the rows appear in the order depicted in your desired output. — James Dellinger, Feb 01 '19 at 07:51
This is `Index(['index', 'YEAR ', 'MN ', 'DT ', '..HR01', '..HR02', '..HR03', '..HR04', '..HR05', '..HR06', '..HR07', '..HR08', '..HR09', '..HR10', '..HR11', '..HR12', '..HR13', '..HR14', '..HR15', '..HR16', '..HR17', '..HR18', '..HR19', '..HR20', '..HR21', '..HR22', '..HR23', '..HR24', 'PRMEAN ', 'PRSMAX ', 'PRSMIN'], dtype='object')` — Bing, Feb 01 '19 at 07:52

jezrael · Accepted Answer · 2019-02-01T09:13:20.910

Data:

print (df)
      MN   DT   HR01  HR02  HR03  HR04  PRMEAN  PRSMAX  PRSMIN
2011    1    1    34    35    67    78      12     235     256
2011    1    2    12    56    78    56      23     256     267
2011    2    1    12     2    32    12      74     221     435

If check columns thre is no column Year (becuse index) and columns MN and DT have traling whitespaces:

print (df.columns)
Index(['MN ', 'DT ', 'HR01', 'HR02', 'HR03', 'HR04', 'PRMEAN', 'PRSMAX',
       'PRSMIN'],
      dtype='object')

Solution is remove them by str.strip, convert index to column and rename:

df.columns = df.columns.str.strip()
df = df.reset_index().rename(columns={'index':'Year'})

print (df)
   Year  MN  DT  HR01  HR02  HR03  HR04  PRMEAN  PRSMAX  PRSMIN
0  2011   1   1    34    35    67    78      12     235     256
1  2011   1   2    12    56    78    56      23     256     267
2  2011   2   1    12     2    32    12      74     221     435

print (df.columns)
Index(['Year', 'MN', 'DT', 'HR01', 'HR02', 'HR03', 'HR04', 'PRMEAN', 'PRSMAX',
       'PRSMIN'],
      dtype='object')

Then use melt with sort_values and remove variable column:

df1 = (df.melt(id_vars=['Year','MN','DT','PRMEAN','PRSMAX','PRSMIN'],value_name='HR')
         .sort_values('variable')
         .drop('variable', axis=1))

Last change ordering of columns:

df1 = df1[['Year','MN','DT','HR','PRMEAN','PRSMAX','PRSMIN']]
print (df1)
    Year  MN  DT  HR  PRMEAN  PRSMAX  PRSMIN
0   2011   1   1  34      12     235     256
3   2011   1   1  35      12     235     256
6   2011   1   1  67      12     235     256
9   2011   1   1  78      12     235     256
1   2011   1   2  12      23     256     267
4   2011   1   2  56      23     256     267
7   2011   1   2  78      23     256     267
10  2011   1   2  56      23     256     267
2   2011   2   1  12      74     221     435
5   2011   2   1   2      74     221     435
8   2011   2   1  32      74     221     435
11  2011   2   1  12      74     221     435

EDIT:

So there is column YEAR, so need:

df = pd.read_csv('Pres.csv')

df.columns = df.columns.str.strip()

df1 = (df.melt(id_vars=['YEAR','MN','DT','PRMEAN','PRSMAX','PRSMIN'],
               value_name='HR',
               var_name='HOUR')
         .assign(HOUR = lambda x: x.HOUR.str.extract('(\d+)'))
         .sort_values(['YEAR','MN','DT','HOUR'])
         )

print (df1.head(30))
       YEAR  MN  DT PRMEAN PRSMAX PRSMIN HOUR     HR
0      2001   1   1  949.5  951.8  947.8   01  949.3
3227   2001   1   1  949.5  951.8  947.8   02  949.1
6454   2001   1   1  949.5  951.8  947.8   03  948.5
9681   2001   1   1  949.5  951.8  947.8   04  948.5
12908  2001   1   1  949.5  951.8  947.8   05  948.3
16135  2001   1   1  949.5  951.8  947.8   06  948.6
19362  2001   1   1  949.5  951.8  947.8   07  949.5
22589  2001   1   1  949.5  951.8  947.8   08  950.3
25816  2001   1   1  949.5  951.8  947.8   09  951.6
29043  2001   1   1  949.5  951.8  947.8   10  951.8
32270  2001   1   1  949.5  951.8  947.8   11  951.8
35497  2001   1   1  949.5  951.8  947.8   12  950.8
38724  2001   1   1  949.5  951.8  947.8   13  949.7
41951  2001   1   1  949.5  951.8  947.8   14  948.8
45178  2001   1   1  949.5  951.8  947.8   15  947.8
48405  2001   1   1  949.5  951.8  947.8   16  947.8
51632  2001   1   1  949.5  951.8  947.8   17  947.8
54859  2001   1   1  949.5  951.8  947.8   18  947.8
58086  2001   1   1  949.5  951.8  947.8   19  948.8
61313  2001   1   1  949.5  951.8  947.8   20  949.7
64540  2001   1   1  949.5  951.8  947.8   21  949.9
67767  2001   1   1  949.5  951.8  947.8   22  950.7
70994  2001   1   1  949.5  951.8  947.8   23  950.6
74221  2001   1   1  949.5  951.8  947.8   24    950
1      2001   1   2  951.2  953.4  948.8   01  949.8
3228   2001   1   2  951.2  953.4  948.8   02  949.5
6455   2001   1   2  951.2  953.4  948.8   03    949
9682   2001   1   2  951.2  953.4  948.8   04  948.8
12909  2001   1   2  951.2  953.4  948.8   05  948.9
16136  2001   1   2  951.2  953.4  948.8   06  949.7

There are 2 Year columns coming ,and I have no idea why-`Index(['Year', 'YEAR', 'MN', 'DT', '..HR01', '..HR02', '..HR03', '..HR04', '..HR05', '..HR06', '..HR07', '..HR08', '..HR09', '..HR10', '..HR11', '..HR12', '..HR13', '..HR14', '..HR15', '..HR16', '..HR17', '..HR18', '..HR19', '..HR20', '..HR21', '..HR22', '..HR23', '..HR24', 'PRMEAN', 'PRSMAX', 'PRSMIN'], dtype='object')` — Bing, Feb 01 '19 at 08:22
opps, one bug - nes sorting by `variable` column, not by `PRSMIN`. — jezrael, Feb 01 '19 at 08:35
So if use only `df.columns = df.columns.str.strip()` `df1 = (df.melt(id_vars=['YEAR','MN','DT','PRMEAN','PRSMAX','PRSMIN'],value_name='HR') .sort_values('variable') .drop('variable', axis=1))` it not working? Are data confidental? — jezrael, Feb 01 '19 at 08:43
Yes, It's not working and I have no clue why its not working. No, the data is not confidential, let me share a google drive link. — Bing, Feb 01 '19 at 08:44
@Bing - For me it working nice, added solution to answer. Maybe some typo I guess or similar. — jezrael, Feb 01 '19 at 08:51
Oh, it worked , must be taking some garbage values from the previous inputs.,but how to sort the Year, MT,DT values correspondingly — Bing, Feb 01 '19 at 08:57
So need change `.sort_values('variable')` to `.sort_values(['variable','Year','MT','DT'])` or `.sort_values(['Year','MT','DT'])` ? — jezrael, Feb 01 '19 at 08:59
No, wait a sec , the output is not correct , I need all the hour values corresponding to every date in the dataframe but it is giving only the first value — Bing, Feb 01 '19 at 09:05
So basically it will be 24 values(corresponding to each hour) to every day — Bing, Feb 01 '19 at 09:05
@Bing - I change solution with not removing HOURS and sorting, can you check now? — jezrael, Feb 01 '19 at 09:13
I need to ask you a question because I am just awestruck by what happened here- How to become this much good like you? — Bing, Feb 01 '19 at 09:23
@Bing - Many hours coding, many years coding... Not easy way, unfortunately. — jezrael, Feb 01 '19 at 09:25

Transpose few elements of a row in Pandas dataframe

1 Answers1