Background
I have the following dataframe
import pandas as pd
df = pd.DataFrame({'Result' : [['pos', '+', 'pos', 'positive'], ['neg', 'neg'], [], ['pos']],
'P_ID': [1,2,3,4],
'Gene' : [['kras', 'kras', 'kras', 'egfr'], ['brca', 'brca'], [], ['cd133']],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['P_ID', 'N_ID', 'Gene', 'Result']]
df
P_ID N_ID Gene Result
0 1 A1 [kras, kras, kras, egfr] [pos, +, pos, positive]
1 2 A2 [brca, brca] [neg, neg]
2 3 A3 [] []
3 4 A4 [cd133] [pos]
And I use the following code taken from unnest (explode) multiple list 2.0
df.set_index('P_ID').apply(lambda x: x.apply(pd.Series).stack()).ffill().reset_index().drop('level_1', 1)
Problem
Using the code above, I get close to what I want. However, because the Gene
and Result
columns in the third row 2
are empty lists []
, I get the output of the row above it as seen below:
P_ID N_ID Gene Result
0 1 A1 kras pos
1 1 A1 kras +
2 1 A1 kras pos
3 1 A1 egfr positive
4 2 A2 brca neg
5 2 A2 brca neg
6 3 A3 brca neg
7 4 A4 cd133 pos
Instead, I would like to get the output below, where the following row 6 3 A3 [] []
reflects the original dataframe df
, which contained empty lists
Desired Output
P_ID N_ID Gene Result
0 1 A1 kras pos
1 1 A1 kras +
2 1 A1 kras pos
3 1 A1 egfr positive
4 2 A2 brca neg
5 2 A2 brca neg
6 3 A3 [] []
7 4 A4 cd133 pos
Question
How do I get the desired output?