I have a dataframe like this:
text text2 category
sfsd sgvv sfsdfdf abc,xyz
zydf sefs sdfsd drdg yyy
dfsd dsrgd dggr dgd xyz
eter vxg wfe fs abc
dfvf ertet dggdss abc,xyz,bbb
I want an output like this:
text text2 category
sfsd sgvv sfsdfdf abc
sfsd sgvv sfsdfdf xyz
zydf sefs sdfsd drdg yyy
dfsd dsrgd dggr dgd xyz
eter vxg wfe fs abc
dfvf ertet dggdss abc
dfvf ertet dggdss xyz
dfvf ertet dggdss bbb
Basically create a new row for each two or more category in category
column.
I tried this:
df1 = (df.assign(category = df['category'].str.split(','))
.explode('category')
.reset_index(drop=True))
But it seems to be creating way more rows than expected. In my original df, I have many columns not just text, text2, category.
Screenshot of my original dataframe.
Here category
= NER_Category
.
Here is the output of the code: