1
df = pd.DataFrame([{'Instrument':'AAA', 'Date':'2012-04-18', 'Time_Bucket':180},
                    {'Instrument':'AAA', 'Date':'2012-04-18', 'Time_Bucket':100},
                    {'Instrument':'AAA', 'Date':'2012-04-18', 'Time_Bucket':67},
                    {'Instrument':'AAA', 'Date':'2012-04-18', 'Time_Bucket':33},
                    {'Instrument':'AAA', 'Date':'2012-04-18', 'Time_Bucket':1},
                    {'Instrument':'AAA', 'Date':'2012-04-19', 'Time_Bucket':175},
                    {'Instrument':'AAA', 'Date':'2012-04-19', 'Time_Bucket':110},
                    {'Instrument':'AAA', 'Date':'2012-04-19', 'Time_Bucket':30},
                    {'Instrument':'AAA', 'Date':'2012-04-19', 'Time_Bucket':1},
                    {'Instrument':'BBB', 'Date':'2012-04-18', 'Time_Bucket':180},
                    {'Instrument':'BBB', 'Date':'2012-04-18', 'Time_Bucket':150},
                    {'Instrument':'BBB', 'Date':'2012-04-18', 'Time_Bucket':10}])

I have the above DataFrame. My aim is to create a large dataframe that contains all the Instruments for different Dates with a Time_Bucket ranging from 180 to 1. Meaning, if I have 100 Instruments for Dates 2012-04-18, 2012-04-19, 2012-05-17, 2012-05-18, 2012-08-15, 2012-08-16, then I will need to create a total number of rows of 100*6*180, with each cell being labelled in an ascending order for Date, but descending order for Time_Bucket. Then I will merge my existing DataFrame with this newly created Dataframe and fill forward for some data analysis. I'm only able to code the following but it doesn't work:

df = pd.DataFrame({ 'Instrument' : 'AAA', 'BBB', 'CCC'}, 
                   {'Date': '2012-04-18', '2012-04-19', '2012-05-17', '2012-05-18', '2012-08-15', '2012-08-16'}, 
                   {'Time_Bucket': range(180, N-1 ,1))

Could you please help? Thank you.

Shaun Lim
  • 75
  • 1
  • 6

1 Answers1

0

Use itertools.product and pass to DataFrame constructor:

i = ['AAA', 'BBB', 'CCC']
d = ['2012-04-18', '2012-04-19', '2012-05-17', '2012-05-18', '2012-08-15', '2012-08-16']
r = range(180, 0 , -1)

from  itertools import product
df = pd.DataFrame(list(product(i, d, r)), columns=['Instrument','Date','Time_Bucket'])
print (df)
     Instrument        Date  Time_Bucket
0           AAA  2012-04-18          180
1           AAA  2012-04-18          179
2           AAA  2012-04-18          178
3           AAA  2012-04-18          177
4           AAA  2012-04-18          176
        ...         ...          ...
3235        CCC  2012-08-16            5
3236        CCC  2012-08-16            4
3237        CCC  2012-08-16            3
3238        CCC  2012-08-16            2
3239        CCC  2012-08-16            1

[3240 rows x 3 columns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252