0

I have a column of data, it mostly has only one value, but some are multi-valued data connected with commas, and some missing values. I want to split multivalued data connected with commas into multiple lines.

I found a good solution in this (Split cell into multiple rows in pandas dataframe).

But this can only extract a few lines split from multi-valued data, I will splicing it with the original data, but my data is a large file, I can't clearly know where each multi-valued data is and delete them.

eg:

In [1]:data = {'id': [106452, 233649, 547531, 707841, 457009],
                'size': (np.nan, 1, 40, 40, '12,13')}
df = pd.DataFrame(data)

then:

In [2]:df_new = (df.set_index(['id'])
   .stack()
   .str.split(',', expand=True)
   .stack()
   .unstack(-2)
   .reset_index(-1, drop=True)
   .reset_index()
)
df_new

Out[1]: id  size
0    457009  12
1    457009  13

if:

In [3]:df_new = (df.set_index(['id'])
   .stack()
   .str.split(',', expand=True)
   .stack()
   .unstack(-2)
   .reset_index(-1, drop=True)
   .reset_index()
)
df = pd.concat([df,df_new]) # I know it's a bit stupid, but I just want to express the idea of merging.
df

Out[2]:
      id    size
0   106452  NaN
1   233649  1
2   547531  40
3   707841  40
4   457009  12,13
0   457009  12
1   457009  13

I want this:

Out[2]:
      id    size
0   106452  NaN
1   233649  1
2   547531  40
3   707841  40
4   457009  12
5   457009  13

I should How to do it?

罗文浩
  • 79
  • 1
  • 1
  • 8
  • Look at [this](https://stackoverflow.com/a/51752361/9081267) answer and use the function `explode_str` like following: `explode_str(df, 'size', ',')` will get your the correct output. First convert your column to string: `df['size'] = df['size'].astype(str)` – Erfan May 08 '19 at 13:58

1 Answers1

0

Try adding astype(str):

df_new = (df.set_index(['id']).astype(str)
   .stack()
   .str.split(',', expand=True)
   .stack()
   .unstack(-2)
   .reset_index(-1, drop=True)
   .reset_index()
)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Oh, great, is there some explanation for the principle? It succeeded, just a simple type conversion – 罗文浩 May 08 '19 at 14:09
  • Yes, it's just a type conversion. The original code only looks at the rows with type `str`, while this code converts everything to `str` and do splitting. Note that the final dataframe's `size` column is of type `str`, so you may need to convert it back to `int` again. – Quang Hoang May 08 '19 at 14:19