5

I have a dataframe

enter image description here

Here i have a column called "name split" which is a column with lists. Now i want to split the contents of the lists and create separate columns for each.

This is what i have tried so far :

df = pd.read_csv("C:/Users/Transorg-PC/Desktop/Training/py/datase/football.csv")

temp = df.copy()

temp['name'] = temp['name'].apply(lambda x: ' '.join(x.split()))

temp['name split'] = temp['name'].apply(lambda x: x.split())

temp['length'] = temp['name split'].str.len()

for i in range(temp['length'].max()-1):
    temp[i] = temp['name split'].apply(lambda x:x[i])

But i am not able to iterate like this as for some cases the index goes out of bound. So how to split the contents of the list in separate columns.

nOObda
  • 123
  • 1
  • 2
  • 9
  • please add a copy of your dataframe rows – Espoir Murhabazi Feb 22 '18 at 18:06
  • Well first of all i am sorry for the wrong format used. This is my first post. The copy of dataframe is given in the provided link. – nOObda Feb 22 '18 at 18:09
  • Possible duplicate of [Pandas split column of lists into multiple columns](https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns) – Vikash Singh Feb 22 '18 at 18:13

3 Answers3

6

Something Like Data from jpp

pd.concat([df,pd.DataFrame(df.name.tolist())],1)
Out[1596]: 
   A    name  0  1
0  1  [1, 2]  1  2
1  1  [3, 4]  3  4
2  2  [5, 6]  5  6

Update

df=pd.DataFrame([[1,[1, 2]],
                   [1,[3, 4]],
                   [2,[5, 6,1,1]]],
                  columns=['A','name'])
pd.concat([df,pd.DataFrame(df.name.tolist())],1)
Out[1602]: 
   A          name  0  1    2    3
0  1        [1, 2]  1  2  NaN  NaN
1  1        [3, 4]  3  4  NaN  NaN
2  2  [5, 6, 1, 1]  5  6  1.0  1.0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • List size is different for each entry and with concat it will just make another column of string (changing it from list to string). How can one make columns for each specific list element and fill the empty rows with an empty string or maybe NaN whichever is suitable to implement. – nOObda Feb 22 '18 at 19:54
  • This version works. Arigato Wen Sama. I am just curious how this works. Does this code first finds the max list size to create that many columns or it just creates a column for 1st element in the list then for the 2nd element and so on. – nOObda Feb 22 '18 at 20:15
  • @nOObda yep, Data frame is combine by the Serise , Basically it they convert each list of list to Serise then append. if not exist will fill with nan – BENY Feb 22 '18 at 20:17
2

This is one way:

df = pd.DataFrame([[1,[1, 2, 3]],
                   [1,[3, 4]],
                   [2,[5, 6, 7, 8]]],
                  columns=['A','name'])

df = df.join(pd.DataFrame(df['name'].tolist()))

#    A          name    0    1    2    3
# 0  1     [1, 2, 3]  1.0  2.0  3.0  NaN
# 1  1        [3, 4]  3.0  4.0  NaN  NaN
# 2  2  [5, 6, 7, 8]  5.0  6.0  7.0  8.0
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Sir you have used list with same sizes. What if the list size varies with each entry and then we want to generate columns where for empty entries it will be replaced by a empty string. This is noticed by Mr Raw Dawg in his suggested solution. – nOObda Feb 22 '18 at 19:49
0

List comprehensions are useful in a case like this:

temp['name'] = temp['name'].apply(lambda x: ' '.join(x.split()))
temp['Name1'] = [item.split()[0] for item in temp['name']]
temp['Name2'] = [item.split()[1] for item in temp['name']]

Edit: Just noticed that you have a different number of items for each entry after you do the splitting. You need to decide how to handle this - how do you want to fill the empty rows in the new column? Adding an empty string or NaN? I assume this is why you get an IndexError

If you want to do this with different number of items in each row, this bit of code will handle this for you. Although, I would think about why you want these non-uniform columns and if there isn't a cleaner way to accomplish your goal.

temp['name_split'] = temp['name'].apply(lambda a: a.split())
max_len = max(temp['name_split'].apply(len))

for ii in range(max_len):
    temp['Name%s'%ii] = [item[ii] if ii < len(item) else 'blank' for item in temp['name_split']]
Sevy
  • 688
  • 4
  • 11
  • Reason for the downvote? This is your problem in a nutshell – Sevy Feb 22 '18 at 19:26
  • Sir, as you noticed that the list size is different for each entry so it would be great if i can have multiple columns with empty string for empty rows. Thus instead of creating a column each time it would be great if i can iterate for columns based on the size of each entry with a for loop maybe or by some simpler pythonic mechanism. – nOObda Feb 22 '18 at 19:38
  • See edit, this addresses the problem but like I say, it is not a sustainable solution to build your dataframes this way – Sevy Feb 22 '18 at 20:05