-1

I have a data set (csv file) of names that list names with number of people with that name, their "rank" and the name itself.

I am looking for a way to separate all the names into single lines ideally in excel - but maybe something in pandas is an option.

The problem is that many of the lines contain multiple names comma separated.

the data looks like this.

rank   | number of occurrences  | name
1      | 10000                  | marie
2      |  9999                  | sophie
3      |  9998                  | ellen
...
...
50     |    122                  | jude, allan, jaspar

I would like to have each name on an individual line alongside its correspondent number of occurrences. Its fine that the rank is duplicated.

Something like this

rank   | number of occurrences  | name
1      | 10000                  | marie
2      |  9999                  | sophie
3      |  9998                  | ellen
..
...
50     |    122                 | jude
50     |    122                 | allan
50     |    122                 | jaspar
StephRas
  • 33
  • 5

2 Answers2

1

Use df.explode()

df.assign(name=(df.name.str.split(','))).explode('name')

Way it works

df.name=# Equivalent of df.assign(name=

df.name.str.split(',')#puts the names in list
df.explode('name')# Disintegrates the multiple names into one per row 




rank  number of occurrences    name
0     1                10000   marie
1     2                 9999  sophie
2     3                 9998   ellen
3    50                  122    jude
3    50                  122   allan
3    50                  122  jaspar
wwnde
  • 26,119
  • 6
  • 18
  • 32
0
In [60]: df
Out[60]:
   rank   no                 name
0    50  122  jude, allan, jaspar

In [61]: df.assign(name=df['name'].str.split(',')).explode('name')
Out[61]:
   rank   no     name
0    50  122     jude
0    50  122    allan
0    50  122   jaspar
bigbounty
  • 16,526
  • 5
  • 37
  • 65
  • The provided answer was flagged for review as a Low Quality Post. Here are some guidelines for [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). This provided answer could benefit from an explanation. Code only answers are not considered "good" answers. From Review. – Trenton McKinney Aug 12 '20 at 04:34