Repeat entire df, for each item in a list

Question

I want to repeat a df for each year in a list.

Every time the df repeats, it should also add the year corresponding to the iteration, in a new column called 'year'.

I have:

>>> ls = ['2019','2018','2017','2016']
>>> df = pd.DataFrame(['a','b'])
>>> df
   0
0  a
1  b

I want:

>>> df
   0    year
0  a  '2019'
1  b  '2019'
2  a  '2018'
3  b  '2018'
4  a  '2017'
5  b  '2017'
6  a  '2016'
7  b  '2016'

score 2 · Answer 1 · answered Oct 18 '19 at 02:33

This is actually taking the Cartesian product of lists in disguise, and which uses this neat solution:

import pandas as pd
pd.DataFrame(index=pd.MultiIndex.from_product([[2019,2018,2017,2016], ['a','b']],
    names=['Year','Value'])).reset_index()

   Year Value
0  2019     a
1  2019     b
2  2018     a
3  2018     b
4  2017     a
5  2017     b
6  2016     a
7  2016     b

score 1 · Answer 2 · answered Oct 18 '19 at 01:39

You can make use of np.repeat and np.tile:

new_df = (df.loc[np.tile(df.index, len(ls))]
            .reset_index(drop=True)
            .assign(Year=np.repeat(ls, len(df)))
         )

Output:

score 1 · Accepted Answer · answered Oct 18 '19 at 02:32

You can do a cartesian join.

df = pd.DataFrame(['a','b'])
dates = pd.DataFrame(['2019','2018','2017','2016'])

df = df.assign(key=1).merge(dates.assign(key=1), on='key').drop('key', axis=1)
df.columns = [0, 'year']
df = df.sort_values('year', ascending=False).reset_index(drop=True)

print(df)
   0  year
0  a  2019
1  b  2019
2  a  2018
3  b  2018
4  a  2017
5  b  2017
6  a  2016
7  b  2016

score 0 · Answer 4 · answered Oct 18 '19 at 01:33

import pandas as pd

ls = ['2019','2018','2017','2016','2015']

col = []
for i, year in enumerate(ls):
  col.append('b' if i%2 else 'a')

df = pd.DataFrame.from_dict({
  '0': col,
  'year': ls,
})

print(df)

Output:

Repeat entire df, for each item in a list

4 Answers4