1

I have a dataframe with multiple columns and the content of one of the columns looks like a list:

df = pd.DataFrame({'Emojis':['[1 2 3 4]', '[4 5 6]']})

What I want to do to split the contents of these "lists" into the columns and since the sizes of the lists are not the same I will have the number of columns with the max of the items (5 items is the max) and whenever the items is less than that I will put null.

So the output will be something like this:

      Emojis it1  it2  it3  it4   it5
0  [1 2 3 4] 1     2    3   4     null
1    [4 5 6] 4     5    6   null  null

I was doing like this:

splitlist = df['Emojis'].apply(pd.Series)
df2 = pd.concat([df, splitlist], axis=1)

but its not close to what I want since the list is not really a list is saved in df as object without ,

jglad
  • 120
  • 2
  • 2
  • 13
sariii
  • 2,020
  • 6
  • 29
  • 57
  • Does this answer your question? [Pandas: split column of lists of unequal length into multiple columns](https://stackoverflow.com/questions/44663903/pandas-split-column-of-lists-of-unequal-length-into-multiple-columns) – jglad Feb 10 '23 at 20:55
  • @jglad not really. since they have a list with , seperated in their dataframe. Mine technically is not a list thats why I mentioned look like a list – sariii Feb 10 '23 at 20:57
  • Did you try using code to make the thing that looks like a list into an actual list, and then applying the solution for actual lists? is the question actually "how do I make the list-looking thing into an actual list", perhaps? – Karl Knechtel Feb 10 '23 at 21:50
  • I tried just did not include all my efforts. Regardless, I think this answer is more pandas way than having the list to look like a real list and then apply that solution – sariii Feb 10 '23 at 22:07

2 Answers2

2

You can use:

out = df.join(pd.DataFrame(df['Emojis'].str.findall('\d+').to_list(), 
                           index=df.index)
              .reindex(columns=range(5))
              .rename(columns=lambda x: f'it{x+1}')
              )

Output:

      Emojis it1 it2 it3   it4  it5
0  [1 2 3 4]   1   2   3     4  NaN
1    [4 5 6]   4   5   6  None  NaN
mozway
  • 194,879
  • 13
  • 39
  • 75
1

You can also use:

df = pd.DataFrame({'Emojis':['[1 2 3 4]', '[4 5 6]']})
for i in range(5):
    column_name = 'it' + str(i)
    df[column_name] = df['Emojis'].astype(str).str[1 + 2 * i]