0

I have a list of items. I also have a dataframe. If the list has 3 items and the dataframe has 4 rows, I want to iterate and add each item and then copy the row and add the next item, etc. So the end result is a dataframe that went from 4 rows to 12 rows (4 rows times 3 items in a list). I tried converting df to list and then iterating via append and extend but it wasn't what I wanted, it just kept appending values to the list rather than copying a new list and only appending the current iterative value.

  group     start       stop
0   abc  1/1/2016   8/1/2016
1   xyz  5/1/2016  12/1/2016
2   jkl  3/7/2017  1/31/2018

b = ['a','b','c','d']

The expected result is a dataframe like this:

group   start   stop    new col
abc 1/1/2016    8/1/2016    a
abc 1/1/2016    8/1/2016    b
abc 1/1/2016    8/1/2016    c
abc 1/1/2016    8/1/2016    d
xyz 5/1/2016    12/1/2016   a
xyz 5/1/2016    12/1/2016   b
xyz 5/1/2016    12/1/2016   c
xyz 5/1/2016    12/1/2016   d
jkl 3/7/2017    1/31/2018   a
jkl 3/7/2017    1/31/2018   b
jkl 3/7/2017    1/31/2018   c
jkl 3/7/2017    1/31/2018   d
cs95
  • 379,657
  • 97
  • 704
  • 746
Chris
  • 495
  • 1
  • 9
  • 26

2 Answers2

3

Check with Performant cartesian product (CROSS JOIN) with pandas

newdf=df.assign(key=1).merge(pd.DataFrame({'key':[1]*len(b),'v':b})).drop('key',1)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • WOW, that works perfectly! Thanks for such a quick response. – Chris Jan 23 '19 at 20:26
  • Thanks for the advertising ;-) – cs95 Jan 23 '19 at 20:29
  • 1
    @coldspeed glad to make more and more people know the canonical answer . – BENY Jan 23 '19 at 20:38
  • 1
    Oh, yes. I have been answering many older questions providing canonical answers so I can use them for better closing. In the last two days I have answered over 20 questions... I am quite satisfied with the progress so far. – cs95 Jan 23 '19 at 20:40
  • Especially you might be interested to see [this answer](https://stackoverflow.com/a/54324513/4909087). – cs95 Jan 23 '19 at 20:41
1

You can do this efficiently using np.repeat:

groups = ['a','b','c','d']  

arr = np.column_stack([
    df.values.repeat(len(groups), axis=0), 
    np.repeat(groups, len(df))
]) 
pd.DataFrame(arr, columns=[*df, 'new_col'])

   group     start       stop new_col
0    abc  1/1/2016   8/1/2016       a
1    abc  1/1/2016   8/1/2016       a
2    abc  1/1/2016   8/1/2016       a
3    abc  1/1/2016   8/1/2016       b
4    xyz  5/1/2016  12/1/2016       b
5    xyz  5/1/2016  12/1/2016       b
6    xyz  5/1/2016  12/1/2016       c
7    xyz  5/1/2016  12/1/2016       c
8    jkl  3/7/2017  1/31/2018       c
9    jkl  3/7/2017  1/31/2018       d
10   jkl  3/7/2017  1/31/2018       d
11   jkl  3/7/2017  1/31/2018       d
cs95
  • 379,657
  • 97
  • 704
  • 746