1

I want to repeat a df for each year in a list.

Every time the df repeats, it should also add the year corresponding to the iteration, in a new column called 'year'.

I have:

>>> ls = ['2019','2018','2017','2016']
>>> df = pd.DataFrame(['a','b'])
>>> df
   0
0  a
1  b

I want:

>>> df
   0    year
0  a  '2019'
1  b  '2019'
2  a  '2018'
3  b  '2018'
4  a  '2017'
5  b  '2017'
6  a  '2016'
7  b  '2016'
Dan
  • 35
  • 4

4 Answers4

2

This is actually taking the Cartesian product of lists in disguise, and which uses this neat solution:

import pandas as pd
pd.DataFrame(index=pd.MultiIndex.from_product([[2019,2018,2017,2016], ['a','b']],
    names=['Year','Value'])).reset_index()

   Year Value
0  2019     a
1  2019     b
2  2018     a
3  2018     b
4  2017     a
5  2017     b
6  2016     a
7  2016     b
smci
  • 32,567
  • 20
  • 113
  • 146
1

You can make use of np.repeat and np.tile:

new_df = (df.loc[np.tile(df.index, len(ls))]
            .reset_index(drop=True)
            .assign(Year=np.repeat(ls, len(df)))
         )

Output:

   0  Year
0  a  2019
1  b  2019
2  a  2018
3  b  2018
4  a  2017
5  b  2017
6  a  2016
7  b  2016
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

You can do a cartesian join.

df = pd.DataFrame(['a','b'])
dates = pd.DataFrame(['2019','2018','2017','2016'])

df = df.assign(key=1).merge(dates.assign(key=1), on='key').drop('key', axis=1)
df.columns = [0, 'year']
df = df.sort_values('year', ascending=False).reset_index(drop=True)

print(df)
   0  year
0  a  2019
1  b  2019
2  a  2018
3  b  2018
4  a  2017
5  b  2017
6  a  2016
7  b  2016
Michael Gardner
  • 1,693
  • 1
  • 11
  • 13
0
import pandas as pd

ls = ['2019','2018','2017','2016','2015']

col = []
for i, year in enumerate(ls):
  col.append('b' if i%2 else 'a')

df = pd.DataFrame.from_dict({
  '0': col,
  'year': ls,
})

print(df)

Output:

   0  year
0  a  2019
1  b  2018
2  a  2017
3  b  2016
4  a  2015
Nick Martin
  • 731
  • 3
  • 17