2

I have daily data in a dataframe (ds) that looks like this spanning several years:

enter image description here

enter image description here

For each day, I need all of the quotes normalized to a specific time's of that particular day. For example, on June 1st, I need all the quotes normalized to the quote at 3pm June 1st, but on June 2nd, I need them all normalized to the quote at 3pm of June 2nd.

I have a dataframe of all the 3pm quotes daily, as shown here:

enter image description here

So I thought it would work if I could simply divide my quotes dataframe (ds) by the 3pm dataframe:

dr = ds.groupby(ds.index.date).apply(lambda x: x / b)

Of note, the 3pm dataframe includes more days than the quotes dataframe (ds). As a result, the quotes dataframe would have to divide through the correct respective days in the 3pm dataframe, probably using something like b.index.date

I've also tried not using a separate dataframe, and instead pulling out the last available quote from the quote dataframe (ds) itself - this did not work either:

enter image description here

EDIT: Thanks for the advice DSM. Does this help at all? Not sure if I should pd.to_string() the dataframe because that doesn't seem easily clipboard-readable either.

Quotes ds dataframe:

bid ask mid
2000-01-04 14:45:12+00:00   281.0   281.5   281.25
2000-01-04 14:46:10+00:00   281.0   282.0   281.5
2000-01-04 14:47:14+00:00   281.2   282.2   281.7
2000-01-04 14:47:22+00:00   281.25  281.85  281.55
2000-01-04 14:47:47+00:00   281.25  281.75  281.5
2000-01-04 14:48:09+00:00   281.4   281.9   281.65
2000-01-04 14:48:40+00:00   281.3   282.3   281.8
2000-01-04 14:49:40+00:00   281.3   281.8   281.55
2000-01-04 14:49:45+00:00   281.2   282.2   281.7
2000-01-04 14:50:53+00:00   281.4   281.9   281.65

3pm dataframe:

bid_close   ask_close   price
2000-01-04 15:00:00+00:00   281.35  281.95  281.65
2000-01-05 15:00:00+00:00   280.73  281.48  281.105
2000-01-06 15:00:00+00:00   279.7   280.3   280.0
2000-01-07 15:00:00+00:00   282.3   282.9   282.6
2000-01-10 15:00:00+00:00   281.7   282.3   282.0
2000-01-11 15:00:00+00:00   282.1   282.7   282.4
2000-01-12 15:00:00+00:00   281.9   282.5   282.2
2000-01-13 15:00:00+00:00   281.9   282.7   282.3
2000-01-14 15:00:00+00:00   283.15  283.75  283.45
2000-01-17 15:00:00+00:00   285.5   286.0   285.75

And the command:

c = ds.groupby(ds.index.date).apply(lambda x: x / x.between_time('14:30:00', '14:59:59').resample('30Min', how='last').dropna(how='all'))

I've also tried (which seems close):

df = a.groupby(a.index.date).apply(lambda x: x / x.between_time('14:45:00', '14:59:59').tail(1))

Here's a link that seems related (and what I based the command above off of): Grouping daily data by month in python/pandas and then normalizing

Community
  • 1
  • 1
Alex Petralia
  • 1,730
  • 1
  • 22
  • 39
  • Aside: not only is it easier to insert text (as opposed to an image) into a question, but if you copy and paste the text then other people can select those parts and use `pd.read_clipboard()` to reproduce your frames. – DSM Mar 31 '15 at 14:46
  • @DSM do you use pd.to_clipboard and then paste it into SO? – camdenl Mar 31 '15 at 15:50
  • 1
    @camdenl I've updated it. I think it's mostly readable now (except the timestamps get split up between date and time). – Alex Petralia Mar 31 '15 at 16:03

2 Answers2

2

Here's a simplified example for the first day, considering that d is your quotes ds dataframe and n is your 3pm dataframe:

#get the first day of the 3pm dataframe
first_day = n.index.levels[0][0]
#get the day of the quotes dataframe from the first day of 3pm frame 
d1 = d.ix[first_day]
#get the 3pm values for the first day
n1 = n.ix[first_day]
#normalize the bid column, don't forget to modify the range variable 
norm = pd.concat([d1[d1.columns[i]].apply(lambda x: x / n1[n1.columns[i]]) for i in range(3)], axis = 1)

>>> norm
                    15:00:00+00:00  15:00:00+00:00      15:00:00+00:00
14:45:12+00:00  0.9987559978674249  0.9984039723355205  0.9985797976211611
14:46:10+00:00  0.9987559978674249  1.0001773364071644  0.9994674241079354
14:47:14+00:00  0.9994668562288963  1.000886682035822   1.0001775252973548
14:47:22+00:00  0.9996445708192642  0.9996453271856713  0.9996449494052904
14:47:47+00:00  0.9996445708192642  0.9992906543713425  0.9994674241079354
14:48:09+00:00  1.0001777145903676  0.9998226635928356  1.0
14:48:40+00:00  0.9998222854096321  1.0012413548501509  1.0005325758920647
14:49:40+00:00  0.9998222854096321  0.9994679907785069  0.9996449494052904
14:49:45+00:00  0.9994668562288963  1.000886682035822   1.0001775252973548
14:50:53+00:00  1.0001777145903676  0.9998226635928356  1.0
camdenl
  • 1,159
  • 4
  • 21
  • 32
  • I ended up doing it another way (filling down the 3pm quotes during an ordered merge on a matching 'date' column with the original quotes frame, then dividing the columns across) and got the same answer. So your answer works as well. Thanks! – Alex Petralia Apr 01 '15 at 19:43
  • Might be worth posting your solution as well! – camdenl Apr 01 '15 at 23:07
0

This is what I did -

I made a date column so the two dataframes would match:

ds['date'] = ds.index.date

I remade the 3pm dataframe so that there were no extra days:

b = ds.groupby(ds.index.date).apply(lambda x: x.between_time('14:45:00', '14:59:59').tail(1))
b = b.rename(columns={'bid': 'b_bid', 'ask': 'b_ask', 'mid': 'b_mid'})
b.index = b.index.droplevel(1)
b.index = pd.to_datetime(b.index)
b = b.drop(['source'], axis=1)

Then I merged in a new dataframe the two on the dates that matched and filled down the 3pm quote:

combined = pd.ordered_merge(ds, fix, on='date', fill_method='pad')
combined.index = ds.index
combined = combined.drop(['date'], axis=1)

And finally I created the normalized columns which I later pulled into their own dataframe:

combined['norm_bid'] = combined['bid'] / combined['b_bid']
combined['norm_ask'] = combined['ask'] / combined['b_ask']
combined['norm_mid'] = combined['mid'] / combined['b_mid']
Alex Petralia
  • 1,730
  • 1
  • 22
  • 39