0

I am trying to stack multiple columns into two columns. my current dataframe looks in the following way with a timestamp and 4 other columns. I'd like to now stack the DIL and VOL column in the DRUG and VAL column.

current dataframe:

     TIME           DRUG     VAL      DIL     VOL          
2018-04-26 14:00:00   A      0.22     D5W     0.1
2018-04-26 19:00:00   B      0.38     D10W    0.22
2018-04-27 16:00:00   C      0.67     D5W     0.26
2018-05-02 16:00:00   A      0.22     N10W    0.1

expected dataframe:

     TIME           DRUG/DIL     VAL/VOL          
2018-04-26 14:00:00   A           0.22     
2018-04-26 14:00:00  D5W          0.1
2018-04-26 19:00:00   B           0.38     
2018-04-26 19:00:00  D10W         0.22
2018-04-27 16:00:00   C           0.67    
2018-04-27 16:00:00  D5W          0.26
2018-05-02 16:00:00   A           0.22     
2018-05-02 16:00:00  N10W         0.1

I tried the solution by using the following link as a reference but I couldn't what I want to achieve. I am pretty sure I am missing a small point and being dumb which I am unable to figure it out.

Pandas DataFrame stack multiple column values into single column

I'd really appreciate if I can get some help with it.

NAB0815
  • 441
  • 4
  • 24

2 Answers2

1

You can check with the DataFrame melt method.

s=df.melt(['TIME','VAL','VOL'])
s['VAL/VOL']=np.where(s.variable=='DRUG',s.VAL,s.VOL)
s
Out[188]: 
       TIME   VAL   VOL variable value  VAL/VOL
0  14:00:00  0.22  0.10     DRUG     A     0.22
1  19:00:00  0.38  0.22     DRUG     B     0.38
2  16:00:00  0.67  0.26     DRUG     C     0.67
3  16:00:00  0.22  0.10     DRUG     A     0.22
4  14:00:00  0.22  0.10      DIL   D5W     0.10
5  19:00:00  0.38  0.22      DIL  D10W     0.22
6  16:00:00  0.67  0.26      DIL   D5W     0.26
7  16:00:00  0.22  0.10      DIL  N10W     0.10
BENY
  • 317,841
  • 20
  • 164
  • 234
  • The above code is working for val, vol column but I am still getting drug and dil in 2 different columns but not in one single column. – NAB0815 Aug 08 '19 at 15:05
0

Try to split it in 2 dataframes, change the columns name and concat them. It would look like something like this:

In [1]:
# Setting the exemple dataset
import pandas as pd

columns = ['TIME', 'DRUG', 'VAL', 'DIL', 'VOL']          
data = [['2018-04-26 14:00:00', 'A', 0.22, 'D5W', 0.1],
['2018-04-26 19:00:00', 'B', 0.38, 'D10W', 0.22],
['2018-04-27 16:00:00', 'C', 0.67, 'D5W', 0.26],
['2018-05-02 16:00:00', 'A', 0.22, 'N10W', 0.1]]

df = pd.DataFrame(data, columns=columns)

# Create what you want

df_drug = df[['TIME', 'DRUG', 'VAL']].rename(columns={'DRUG':'DRUG/DIL', 'VAL':'VAL/VOL'})
df_dil = df[['TIME', 'DIL', 'VOL']].rename(columns={'DIL':'DRUG/DIL', 'VOL':'VAL/VOL'})
df_final = pd.concat([df_drug, df_dil])
df_final

Out [1]:
            TIME            DRUG/DIL    VAL/VOL
0   2018-04-26 14:00:00        A         0.22
1   2018-04-26 19:00:00        B         0.38
2   2018-04-27 16:00:00        C         0.67
3   2018-05-02 16:00:00        A         0.22
0   2018-04-26 14:00:00       D5W        0.10
1   2018-04-26 19:00:00       D10W       0.22
2   2018-04-27 16:00:00       D5W        0.26
3   2018-05-02 16:00:00       N10W       0.10

(PS: If you really want to have sorted by Time, you can add .sort_values(by='TIME') at the end of the concat)

Benoit Drogou
  • 969
  • 1
  • 5
  • 15