I have an excel sheet
Col1 Col2 Col3 Col4
John English\nMaths 34\n33 Pass
Sam Science 40 Pass
Jack English\nHistory\nGeography 89\n07\n98 Pass
Need to convert it to
Col1 Col2 Col3 Col4
John English 34 Pass
John Maths 33 Pass
Sam Science 40 Pass
Jack English 89 Pass
Jack History 07 Pass
Jack Geography 98 Pass
The excel sheet has \n
as separator for corresponding Col2 and col3 column. Just need to pull each subject in a new row with its corresponding marks and copy all the other column contents as it is.
Tried
split_cols = ['Col2', 'Col3']
# loop over the columns and split them
separator = '\n'
for col in split_cols:
df[[f'{col}_Split1', f'{col}_Split2']] = df[col].str.split(separator, n=1, expand=True).fillna('')
# create two new dataframes with the desired columns
df1 = df[['Col1', 'Col2_Split1', 'Col3_Split1', 'Col4']].rename(columns={'Col2_Split1': 'D', 'Col3_Split1': 'C'})
df2 = df[['Col1', 'Col2_Split2', 'Col3_Split2', 'Col4']].rename(columns={'Col2_Split2': 'D', 'Col3_Split2': 'C'})
# concatenate the two dataframes
final_df = pd.concat([df1, df2], ignore_index=True)
# print the final dataframe
print(final_df)