1

Starting with:

import pandas as pd

lis1= [['apples'],['bananas','oranges','cinnamon'],['pears','juice']]
lis2= [['john'],['stacy'],['ron']]

pd.DataFrame({'fruits':lis1,'users':lis2})

                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

I'd like to end with:

lis3= ['apples','bananas','oranges','cinnamon','pears','juice']
lis4= ['john','stacy','stacy','stacy','ron','ron']

pd.DataFrame({'fruits': lis3, 'users':lis4})

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

First, I need to create a new dataframe with each item sitting in its own row. Second, the name variable needs to repeat itself depending on the number of "fruits". So looking at the example, John has one fruit while Stacy has 5 fruits-- so under usernames Stacy has to be repeated 5 times.

sacuL
  • 49,704
  • 8
  • 81
  • 106
D500
  • 442
  • 5
  • 17
  • Does both list have the same number of elements (lists)? – Dani Mesejo Sep 19 '18 at 21:59
  • 1
    Possible duplicate of ["unstack" a pandas column containing lists into multiple rows](https://stackoverflow.com/questions/42012152/unstack-a-pandas-column-containing-lists-into-multiple-rows) – abcdaire Sep 19 '18 at 22:05

3 Answers3

3

itertools

from itertools import chain, product, starmap

pd.DataFrame(
    [*chain(*starmap(product, zip(df.fruits, df.users)))],
    columns=df.columns
)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

This also works if you have just 2 columns

pd.DataFrame(
    [*chain(*starmap(product, zip(*map(df.get, df))))],
    columns=df.columns
)

generator

def f(z):
  for A, B in z:
    for a in A:
      for b in B:
        yield (a, b)

pd.DataFrame([*f(zip(df.fruits, df.users))], columns=df.columns)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Assuming that lis1 and lis2 have the same number of elements, you can do this with a list comprehension after zipping the lists.

pd.DataFrame(
  [{'fruit':F, 'users':U} for (f, u) in zip(lis1, lis2) for F in f for U in u]
)

The below code produces the following output:

      fruit    users
0    apples     john
1   bananas    stacy
2   oranges    stacy
3  cinnamon    stacy
4     pears      ron
5     juice      ron
cpander
  • 374
  • 2
  • 9
  • This works only because I have access to lis1/lis2 in the example. For my dataset, I'm given a dataframe with a column variable "fruit" and "user". The rows are populated with lists like the above example. Would lis1 essentially be: df['fruit] ? -- which makes it a series, do they work like a list? – D500 Sep 19 '18 at 22:18
1

Here is a solution with lots of stacking and unstacking:

Starting with:

>>> df
                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

Use:

final = (df.stack().apply(pd.Series)
         .stack(0).unstack(1)
         .ffill()
         .reset_index(drop=True))

>>> final
     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron
sacuL
  • 49,704
  • 8
  • 81
  • 106