6

I am new to python and learning to use dataframes and list comprehensions. I have the following dataframe:

df1=pd.DataFrame({'names':[[['Hans Peter'],['Harry Potter']],[['bla bla'],['some string']]]})

now i want to split each sublist into words. For a single list of lists i could use

x=[['Hans Peter'],['Harry Potter'],['bla bla'],['some string here']]
res=[]
for list in x:
    res.append(str(list[0]).split())

but how can i iterate this over a dataframe? i think i have to build a list comprehensions and then use the apply() method to overgo the .append? but i dont know how to do this. I would build the list comprehension for a single list like this:

res = [str(list[0]).split for list in x]

but i get a list containing this functions:

[<function str.split(sep=None, maxsplit=-1)>,...]

the expected output for a DataFrame would be

 0 [['Hans','Peter],['Harry','Potter']]
 1 [['bla','bla'],['some','string']]
user11638654
  • 305
  • 2
  • 12

1 Answers1

5

First, you need to call the split function, because otherwise str.split is an object:

''.split
<built-in method split of str object at 0x1005a3ab0>

''.split() # call with parentheses
[]

Second, you need to get down to the sub-lists within names. You can simulate this with a for loop first:

for x in df1.names:
    for a in x:
        print(a)

['Hans Peter']
['Harry Potter']
['bla bla']
['some string']

You'll be left still with lists, so you can use a.pop() to get the strings out, then use str.split() on the result of pop():

df1.names = [[a.pop().split() for a in x] for x in df1.names]

df1
                              names
0  [[Hans, Peter], [Harry, Potter]]
1      [[bla, bla], [some, string]]
C.Nivs
  • 12,353
  • 2
  • 19
  • 44