Python Dataframe Explode Rows with multiple values

Question

I am sorry to replicate the same question which was answered before but they don't seem to give me the desired outcome, maybe I missed something.

I have a subset of the Stack Overflow dataset like the following:

**tags                          time**
c#,winforms                     35
html,css,internet-explorer-7    855
c#,conversion,j#                472
c#,datetime                     556
c#,.net,datetime,timespan       1
php,security                    3
mysql                           5
codeigniter,routes              4
c#,progressbar                  4
.net,ide,linux,mono             2

And I want the output like following:

**tags                  time**
c#                      35
winforms                35
html                    855
css                     855
internet-explorer-7     855
c#                      472
conversion              472
j#                      472
c#                      556
datetime                556
c#                      1
.net                    1
datetime                1
timespan                1
php                     3
security                3
mysql                   5
codeigniter             4
routes                  4
c#                      4
progressbar             4
.net                    2
ide                     2
linux                   2
mono                    2

I have tried the following methods:

Option-1:

df.explode('tags')

Option-2:

df.set_index(['time']).tags.apply(pd.Series).stack().reset_index(name = 'tags').drop('level_1', axis = 1)

In both cases, I get the output the same as my dataframe without exploding. What am I doing wrong here?

I kindly suggest you avoid editing your question and let some experts do the rest. — TheFaultInOurStars, Feb 14 '22 at 21:48
Explode only works for list-link columns. You'll need to split the comma separated strings into a list first then explode. See [this answer](https://stackoverflow.com/a/57122617/15497888) for a full breakdown. — Henry Ecker, Feb 14 '22 at 21:52

score 0 · Accepted Answer · answered Feb 14 '22 at 21:43

0

From pandas docs pandas.DataFrame.explode

specify a non-empty list with each element be str or tuple

To use explode your 'tags' column needs to be a list type. Apply a function to convert your string tags separated by commas to a list then go with option 1 df.explode('tags')

answered Feb 14 '22 at 21:43

Stefan

86
2

Thanks a lot, @Stefan. This was the problem. I have converted the strings into list and explode method worked!! – Ariful Islam Feb 14 '22 at 22:22

score 0 · Answer 2 · answered Feb 14 '22 at 21:44

Actually, from the first version of your question(which was not edited), I guess what you need is a loop over rows using iterrows. What I came up with is what following (To keep this answer from becoming lengthy, I just copied a part of your dataframe):

import pandas as pd
dataframe = pd.DataFrame({"tags": ["#c,windoforms,css", "#c,datetime"], "time":[35,40]})
newTags = []
newTime = []
for index, row in dataframe.iterrows():
  for name in row["tags"].split(","):
    newTags.append(name)
    newTime.append(row["time"])
resultDataframe = pd.DataFrame({"tags": newTags, "time":newTime})
resultDataframe

Output

|    | tags       |   time |
|---:|:-----------|-------:|
|  0 | #c         |     35 |
|  1 | windoforms |     35 |
|  2 | css        |     35 |
|  3 | #c         |     40 |
|  4 | datetime   |     40 |

Hi Amirhossein Kiani, Thank you very much, it worked well. But I found the DataFrame.explode() method to be producing faster result after fixing the string list format issue as mentioned by @Stefan below. — Ariful Islam, Feb 14 '22 at 22:25

Python Dataframe Explode Rows with multiple values

2 Answers2

Output