removing lists from dataframes while adding data

Question

Starting with:

import pandas as pd

lis1= [['apples'],['bananas','oranges','cinnamon'],['pears','juice']]
lis2= [['john'],['stacy'],['ron']]

pd.DataFrame({'fruits':lis1,'users':lis2})

                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

I'd like to end with:

lis3= ['apples','bananas','oranges','cinnamon','pears','juice']
lis4= ['john','stacy','stacy','stacy','ron','ron']

pd.DataFrame({'fruits': lis3, 'users':lis4})

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

First, I need to create a new dataframe with each item sitting in its own row. Second, the name variable needs to repeat itself depending on the number of "fruits". So looking at the example, John has one fruit while Stacy has 5 fruits-- so under usernames Stacy has to be repeated 5 times.

Possible duplicate of ["unstack" a pandas column containing lists into multiple rows](https://stackoverflow.com/questions/42012152/unstack-a-pandas-column-containing-lists-into-multiple-rows) — abcdaire, Sep 19 '18 at 22:05

piRSquared · Answer 1 · 2018-09-19T22:23:04.130

`itertools`

from itertools import chain, product, starmap

pd.DataFrame(
    [*chain(*starmap(product, zip(df.fruits, df.users)))],
    columns=df.columns
)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

This also works if you have just 2 columns

pd.DataFrame(
    [*chain(*starmap(product, zip(*map(df.get, df))))],
    columns=df.columns
)

`generator`

def f(z):
  for A, B in z:
    for a in A:
      for b in B:
        yield (a, b)

pd.DataFrame([*f(zip(df.fruits, df.users))], columns=df.columns)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

score 2 · Accepted Answer · answered Sep 19 '18 at 22:08

2

Assuming that lis1 and lis2 have the same number of elements, you can do this with a list comprehension after zipping the lists.

pd.DataFrame(
  [{'fruit':F, 'users':U} for (f, u) in zip(lis1, lis2) for F in f for U in u]
)

The below code produces the following output:

      fruit    users
0    apples     john
1   bananas    stacy
2   oranges    stacy
3  cinnamon    stacy
4     pears      ron
5     juice      ron

answered Sep 19 '18 at 22:08

cpander

374
2
9

This works only because I have access to lis1/lis2 in the example. For my dataset, I'm given a dataframe with a column variable "fruit" and "user". The rows are populated with lists like the above example. Would lis1 essentially be: df['fruit] ? -- which makes it a series, do they work like a list? – D500 Sep 19 '18 at 22:18

score 1 · Answer 3 · answered Sep 19 '18 at 22:03

Here is a solution with lots of stacking and unstacking:

Starting with:

>>> df
                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

Use:

final = (df.stack().apply(pd.Series)
         .stack(0).unstack(1)
         .ffill()
         .reset_index(drop=True))

>>> final
     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

removing lists from dataframes while adding data

3 Answers3

itertools

generator

`itertools`

`generator`