Iterate over list elements in pandas dataframe - each entry has different size and a new column needs to be generated w.r.t each entry in list

Question

I have a dataframe

Here i have a column called "name split" which is a column with lists. Now i want to split the contents of the lists and create separate columns for each.

This is what i have tried so far :

df = pd.read_csv("C:/Users/Transorg-PC/Desktop/Training/py/datase/football.csv")

temp = df.copy()

temp['name'] = temp['name'].apply(lambda x: ' '.join(x.split()))

temp['name split'] = temp['name'].apply(lambda x: x.split())

temp['length'] = temp['name split'].str.len()

for i in range(temp['length'].max()-1):
    temp[i] = temp['name split'].apply(lambda x:x[i])

But i am not able to iterate like this as for some cases the index goes out of bound. So how to split the contents of the list in separate columns.

Well first of all i am sorry for the wrong format used. This is my first post. The copy of dataframe is given in the provided link. — nOObda, Feb 22 '18 at 18:09
Possible duplicate of [Pandas split column of lists into multiple columns](https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns) — Vikash Singh, Feb 22 '18 at 18:13

BENY · Accepted Answer · 2018-02-22T19:57:32.157

6

Something Like Data from jpp

pd.concat([df,pd.DataFrame(df.name.tolist())],1)
Out[1596]: 
   A    name  0  1
0  1  [1, 2]  1  2
1  1  [3, 4]  3  4
2  2  [5, 6]  5  6

Update

df=pd.DataFrame([[1,[1, 2]],
                   [1,[3, 4]],
                   [2,[5, 6,1,1]]],
                  columns=['A','name'])
pd.concat([df,pd.DataFrame(df.name.tolist())],1)
Out[1602]: 
   A          name  0  1    2    3
0  1        [1, 2]  1  2  NaN  NaN
1  1        [3, 4]  3  4  NaN  NaN
2  2  [5, 6, 1, 1]  5  6  1.0  1.0

edited Feb 22 '18 at 19:57

answered Feb 22 '18 at 18:07

BENY

317,841
20
164
234

List size is different for each entry and with concat it will just make another column of string (changing it from list to string). How can one make columns for each specific list element and fill the empty rows with an empty string or maybe NaN whichever is suitable to implement. – nOObda Feb 22 '18 at 19:54
This version works. Arigato Wen Sama. I am just curious how this works. Does this code first finds the max list size to create that many columns or it just creates a column for 1st element in the list then for the 2nd element and so on. – nOObda Feb 22 '18 at 20:15
@nOObda yep, Data frame is combine by the Serise , Basically it they convert each list of list to Serise then append. if not exist will fill with nan – BENY Feb 22 '18 at 20:17

jpp · Answer 2 · 2018-02-22T20:03:29.940

2

This is one way:

df = pd.DataFrame([[1,[1, 2, 3]],
                   [1,[3, 4]],
                   [2,[5, 6, 7, 8]]],
                  columns=['A','name'])

df = df.join(pd.DataFrame(df['name'].tolist()))

#    A          name    0    1    2    3
# 0  1     [1, 2, 3]  1.0  2.0  3.0  NaN
# 1  1        [3, 4]  3.0  4.0  NaN  NaN
# 2  2  [5, 6, 7, 8]  5.0  6.0  7.0  8.0

edited Feb 22 '18 at 20:03

answered Feb 22 '18 at 18:18

jpp

159,742
34
281
339

Sir you have used list with same sizes. What if the list size varies with each entry and then we want to generate columns where for empty entries it will be replaced by a empty string. This is noticed by Mr Raw Dawg in his suggested solution. – nOObda Feb 22 '18 at 19:49

Sevy · Answer 3 · 2018-02-22T20:04:28.643

0

List comprehensions are useful in a case like this:

temp['name'] = temp['name'].apply(lambda x: ' '.join(x.split()))
temp['Name1'] = [item.split()[0] for item in temp['name']]
temp['Name2'] = [item.split()[1] for item in temp['name']]

Edit: Just noticed that you have a different number of items for each entry after you do the splitting. You need to decide how to handle this - how do you want to fill the empty rows in the new column? Adding an empty string or NaN? I assume this is why you get an IndexError

If you want to do this with different number of items in each row, this bit of code will handle this for you. Although, I would think about why you want these non-uniform columns and if there isn't a cleaner way to accomplish your goal.

temp['name_split'] = temp['name'].apply(lambda a: a.split())
max_len = max(temp['name_split'].apply(len))

for ii in range(max_len):
    temp['Name%s'%ii] = [item[ii] if ii < len(item) else 'blank' for item in temp['name_split']]

edited Feb 22 '18 at 20:04

answered Feb 22 '18 at 18:10

Sevy

688
4
11

Reason for the downvote? This is your problem in a nutshell – Sevy Feb 22 '18 at 19:26
Sir, as you noticed that the list size is different for each entry so it would be great if i can have multiple columns with empty string for empty rows. Thus instead of creating a column each time it would be great if i can iterate for columns based on the size of each entry with a for loop maybe or by some simpler pythonic mechanism. – nOObda Feb 22 '18 at 19:38
See edit, this addresses the problem but like I say, it is not a sustainable solution to build your dataframes this way – Sevy Feb 22 '18 at 20:05

Iterate over list elements in pandas dataframe - each entry has different size and a new column needs to be generated w.r.t each entry in list

3 Answers3