I have a column of data, it mostly has only one value, but some are multi-valued data connected with commas, and some missing values. I want to split multivalued data connected with commas into multiple lines.
I found a good solution in this (Split cell into multiple rows in pandas dataframe).
But this can only extract a few lines split from multi-valued data, I will splicing it with the original data, but my data is a large file, I can't clearly know where each multi-valued data is and delete them.
eg:
In [1]:data = {'id': [106452, 233649, 547531, 707841, 457009],
'size': (np.nan, 1, 40, 40, '12,13')}
df = pd.DataFrame(data)
then:
In [2]:df_new = (df.set_index(['id'])
.stack()
.str.split(',', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index()
)
df_new
Out[1]: id size
0 457009 12
1 457009 13
if:
In [3]:df_new = (df.set_index(['id'])
.stack()
.str.split(',', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index()
)
df = pd.concat([df,df_new]) # I know it's a bit stupid, but I just want to express the idea of merging.
df
Out[2]:
id size
0 106452 NaN
1 233649 1
2 547531 40
3 707841 40
4 457009 12,13
0 457009 12
1 457009 13
I want this:
Out[2]:
id size
0 106452 NaN
1 233649 1
2 547531 40
3 707841 40
4 457009 12
5 457009 13
I should How to do it?