I am trying to split a dataframe column into multiple columns as under:
- There are three columns overall. Two should rename in the new dataframe, while the third one to be split into new columns.
- Split is to be done using a specific character (say ":")
- The column that requires split can have varied number of ":" split. So new columns can be different for different rows, leaving some column values as NULL for some rows. That is okay.
- Each subsequently formed column has a specific name. Max number of columns that can be formed is known.
- There are four dataframes. Each one has this same formatted column that has to be split.
I came across following solutions but they don't work for the reasons mentioned:
-
pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)
- This creates columns with names as 0,1,2... I need the new columns to have specific names.
-
df = df.apply(lambda x:pd.Series(x))
- This does no change to the dataframe. Couldn't understand why.
-
df['command'], df['value'] = df[0].str.split().str
- Here the column names are renamed properly, but this requires knowing beforehand how many columns will be formed. In my case, it is dynamic for each dataframe. For rows, the split successfully puts NULL value in extra columns. But using the same code for another dataframe generates an error saying number of keys should be same.
I couldn't post comments on these answers as I am new here on this community. I would appreciate if someone can help me understand how I can achieve my objective - which is: Dynamically use same code to split one column into many for different dataframes on multiple occasions while renaming the newly generated columns to predefined name.
For example:
Dataframe 1:
Col1 Col2 Col3
0 A A:B:C A
1 A A:B:C:D:E A
2 A A:B A
Dataframe 2:
Col1 Col2 Col3
0 A A:B:C A
1 A A:B:C:D A
2 A A:B A
Output should be: New dataframe 1:
Col1 ColA ColB ColC ColD ColE Col3
0 A A B C NaN NaN A
1 A A B C D E A
2 A A B NaN NaN NaN A
New dataframe 2:
Col1 ColA ColB ColC ColD ColE Col3
0 A A B C NaN NaN A
1 A A B C D NaN A
2 A A B NaN NaN NaN A
(If ColE is not there, then also it is fine.)
After this, I will be concatenating these dataframes into one, where I will need counts of all ColA to ColE for individual dataframes against Col1 and Col3 combinations. So, we need to keep this in mind.