1

actually, I am dealing with a question about preparing a pd.dataframe for further analysis. I have a column of the type 'strg'. I want to split this strg with the separator = ',' into substrings.

If substrings are created, I want to copy the whole row, so that for each substance an indication is assigned.

data = {
        'Substance' : ['Substance1', 'Substance2', 'Substance1', 'Substance3'],
         'Name' : ['Bayer', 'Sanofi', 'Pfizer', 'AstraZeneca'],
         'Indication' : ['bradycardia', 'cardiac arrhythmia, cardio-pulmonary reanimation, tachycardia', 'Blood Thinning', 'Something else'],
         }
    
df = pd.DataFrame(data)

print(df)

I could split the string into different columns. (Source and credits @jezrael: Split Column into Unknown Number of Columns by Delimiter in Pandas Dataframe)

df1 = df['Indication'].str.split(',', expand=True).add_prefix('Indication_').fillna(np.nan)
df = df.join(df1)
    
print(df)

And here is a solution that is very close to my preferred result (Source and credits @jezrael: Pandas Split Column). But I need the other columns...

df = (pd.DataFrame(df['Indication'].str.split(',', expand=True).values.tolist())
           .stack().reset_index(level=0, drop=True)
           .reset_index())
df.columns = ['keys','values']
   
print(df)

However, I would prefer to copy the entire row when substrings are created, so that an indication is assigned for each substance. My expected result would be:

expected_res = {
        'Substance' : ['Substance1', 'Substance2', 'Substance2', 'Substance2', 'Substance1', 'Substance3'],
         'Name' : ['Bayer', 'Sanofi', 'Sanofi','Sanofi','Pfizer', 'AstraZeneca'],
         'Indication' : ['bradycardia', 'cardiac arrhythmia', 'cardio-pulmonary reanimation', 'tachycardia', 'Blood Thinning', 'Something else'],
         }
    
expected_df = pd.DataFrame(expected_res)
    
print(expected_df)

I am grateful for any advice!

Paul G.
  • 61
  • 6

0 Answers0