actually, I am dealing with a question about preparing a pd.dataframe for further analysis. I have a column of the type 'strg'. I want to split this strg with the separator = ',' into substrings.
If substrings are created, I want to copy the whole row, so that for each substance an indication is assigned.
data = {
'Substance' : ['Substance1', 'Substance2', 'Substance1', 'Substance3'],
'Name' : ['Bayer', 'Sanofi', 'Pfizer', 'AstraZeneca'],
'Indication' : ['bradycardia', 'cardiac arrhythmia, cardio-pulmonary reanimation, tachycardia', 'Blood Thinning', 'Something else'],
}
df = pd.DataFrame(data)
print(df)
I could split the string into different columns. (Source and credits @jezrael: Split Column into Unknown Number of Columns by Delimiter in Pandas Dataframe)
df1 = df['Indication'].str.split(',', expand=True).add_prefix('Indication_').fillna(np.nan)
df = df.join(df1)
print(df)
And here is a solution that is very close to my preferred result (Source and credits @jezrael: Pandas Split Column). But I need the other columns...
df = (pd.DataFrame(df['Indication'].str.split(',', expand=True).values.tolist())
.stack().reset_index(level=0, drop=True)
.reset_index())
df.columns = ['keys','values']
print(df)
However, I would prefer to copy the entire row when substrings are created, so that an indication is assigned for each substance. My expected result would be:
expected_res = {
'Substance' : ['Substance1', 'Substance2', 'Substance2', 'Substance2', 'Substance1', 'Substance3'],
'Name' : ['Bayer', 'Sanofi', 'Sanofi','Sanofi','Pfizer', 'AstraZeneca'],
'Indication' : ['bradycardia', 'cardiac arrhythmia', 'cardio-pulmonary reanimation', 'tachycardia', 'Blood Thinning', 'Something else'],
}
expected_df = pd.DataFrame(expected_res)
print(expected_df)
I am grateful for any advice!