Cartesian product of a DataFrame and list

Question

I have a list of items. I also have a dataframe. If the list has 3 items and the dataframe has 4 rows, I want to iterate and add each item and then copy the row and add the next item, etc. So the end result is a dataframe that went from 4 rows to 12 rows (4 rows times 3 items in a list). I tried converting df to list and then iterating via append and extend but it wasn't what I wanted, it just kept appending values to the list rather than copying a new list and only appending the current iterative value.

  group     start       stop
0   abc  1/1/2016   8/1/2016
1   xyz  5/1/2016  12/1/2016
2   jkl  3/7/2017  1/31/2018

b = ['a','b','c','d']

The expected result is a dataframe like this:

group   start   stop    new col
abc 1/1/2016    8/1/2016    a
abc 1/1/2016    8/1/2016    b
abc 1/1/2016    8/1/2016    c
abc 1/1/2016    8/1/2016    d
xyz 5/1/2016    12/1/2016   a
xyz 5/1/2016    12/1/2016   b
xyz 5/1/2016    12/1/2016   c
xyz 5/1/2016    12/1/2016   d
jkl 3/7/2017    1/31/2018   a
jkl 3/7/2017    1/31/2018   b
jkl 3/7/2017    1/31/2018   c
jkl 3/7/2017    1/31/2018   d

score 3 · Accepted Answer · answered Jan 23 '19 at 20:23

3

Check with Performant cartesian product (CROSS JOIN) with pandas

newdf=df.assign(key=1).merge(pd.DataFrame({'key':[1]*len(b),'v':b})).drop('key',1)

answered Jan 23 '19 at 20:23

BENY

317,841
20
164
234

WOW, that works perfectly! Thanks for such a quick response. – Chris Jan 23 '19 at 20:26
Thanks for the advertising ;-) – cs95 Jan 23 '19 at 20:29
1

@coldspeed glad to make more and more people know the canonical answer . – BENY Jan 23 '19 at 20:38
1

Oh, yes. I have been answering many older questions providing canonical answers so I can use them for better closing. In the last two days I have answered over 20 questions... I am quite satisfied with the progress so far. – cs95 Jan 23 '19 at 20:40
Especially you might be interested to see [this answer](https://stackoverflow.com/a/54324513/4909087). – cs95 Jan 23 '19 at 20:41

score 1 · Answer 2 · answered Jan 23 '19 at 20:28

You can do this efficiently using np.repeat:

groups = ['a','b','c','d']  

arr = np.column_stack([
    df.values.repeat(len(groups), axis=0), 
    np.repeat(groups, len(df))
]) 
pd.DataFrame(arr, columns=[*df, 'new_col'])

   group     start       stop new_col
0    abc  1/1/2016   8/1/2016       a
1    abc  1/1/2016   8/1/2016       a
2    abc  1/1/2016   8/1/2016       a
3    abc  1/1/2016   8/1/2016       b
4    xyz  5/1/2016  12/1/2016       b
5    xyz  5/1/2016  12/1/2016       b
6    xyz  5/1/2016  12/1/2016       c
7    xyz  5/1/2016  12/1/2016       c
8    jkl  3/7/2017  1/31/2018       c
9    jkl  3/7/2017  1/31/2018       d
10   jkl  3/7/2017  1/31/2018       d
11   jkl  3/7/2017  1/31/2018       d

Cartesian product of a DataFrame and list

2 Answers2