0

Hello – I am trying to port a few Excel worksheets into a python script I can run on the fly. I am querying Bloomberg data, and essentially what I am looking to do is grab historical bid and ask prices for specific ISINs on specific dates. I am using the xbbg library to query the API and working within a Jupyter Notebook. The data is being stored in a Pandas dataframe.

This is the very simple Excel functionality I am trying to model:

Excel Functionality Screenshot

I have no problem pulling this data for individual line items. However, I am trying to run this for 25,000 index constituents/dates (in Excel you can just drag and drop down the formula). I have code that allows and returns the correct data, however it takes hours to run for this many lines. I believe it is because the code is waiting for each API call to resolve before going on to the next identifier.

I have been looking for solutions to this problem, namely using a library called asyncio. I do have code snippets that will work with the syntax, but it is still taking entirely too much time. Here is my code for the asyncio version to pull this data:

async def bbg_data(isin, start_date, end_date):
    isin = '/isin/' + isin + '@BVAL'
    previous_close_date = pd.to_datetime(end_date).strftime('%Y%m%d')
    px_bid = blp.bdh(tickers=isin, flds=['PX_BID'], start_date=previous_close_date, end_date=previous_close_date)
    if not px_bid.empty:
        px_bid = px_bid.iloc[0][0]
    else:
        px_bid = "N/A"
    px_ask = blp.bdh(tickers=isin, flds=['PX_ASK'], start_date=previous_close_date, end_date=previous_close_date)
    if not px_ask.empty:
        px_ask = px_ask.iloc[0][0]
    else:
        px_ask = "N/A"
    return px_bid, px_ask

async def main():
    tasks = [asyncio.create_task(bbg_data(isin, start_date, end_date)) for isin, start_date, end_date in ttd_sample[['ISIN', 'PreviousClose', 'PreviousClose']].values]
    results = await asyncio.gather(*tasks)
    ttd_sample['previous_bid_price'] = [result[0] for result in results]
    ttd_sample['previous_ask_price'] = [result[1] for result in results]

await main()

A few questions that I have on this issue, hopefully with someone with BLPAPI/asyncio experience:

  1. Is my reliance on using the xbbg library causing me too many headaches? Would I be able to solve this issue in one shot just using blpapi? If so, are there any good tutorials for a beginner programmer?

  2. Is there an issue with how I wrote my asyncio code? I think I am close, but I am not sure if it is truly asynchronous.

Any advice on how to tackle this issue would be appreciated.

  • https://stackoverflow.com/q/75979818/15877695 – AKdemy Apr 25 '23 at 16:34
  • Thanks, the solution will work if I am only pulling in one data field for a list of identifiers. Here I need data for a list of identifiers that also has a specific date value that is in another column within the dataframe. –  Apr 25 '23 at 17:09
  • Are you familiar with the API at all? I recommend not just copy pasting existing code. – AKdemy Apr 25 '23 at 17:35
  • I wrote this code. –  Apr 25 '23 at 17:54
  • Answer to 1) No. xbbg is a relatively thin wrapper around the blpapi. You need to group the ISINs together and send as one query. The time is taken up with creating the bloomberg request and transmitting/receiving: it is not usually an issue with the size of the data. Perhaps group ISINs together in batches and call `bdh` with the min/max of the start/end dates in the batch. The response time of `bdh` is not noticeably different if you ask for 1 data point or 100: it's the setup and network delay that takes the time. Maybe experiment with the optimal batch size. – DS_London Apr 26 '23 at 07:45
  • @DS_London Thanks, appreciate that insight. That's unfortunate since I need specific date/identifier combinations. A loop here over the identifiers and dates seems to take forever. Maybe I can cut up the data frame for each day and send all at once that way multiple times. –  Apr 26 '23 at 13:14

1 Answers1

0

The setup costs of each bdh() request are high. It could be quicker to get a range of dates, and then select the one you want. As an example, here there is only one call to bdh() rather than 10:

from xbbg import blp
from datetime import date
import pandas as pd

pairs = [['DE0001102325 Govt',date(2023,4,3)],
         ['DE0001134922 Govt',date(2023,3,15)],
         ['DE0001134922 Govt',date(2023,4,19)],
         ['DE0001102358 Govt',date(2023,2,14)],
         ['DE0001102366 Govt',date(2023,4,12)] ]

tickers = [p[0] for p in pairs]
dates = [p[1] for p in pairs]

fields = ['PX_BID','PX_ASK']

min_date = min(dates)
max_date = max(dates)

df = blp.bdh(set(tickers),fields,min_date,max_date)
   
data=[]
index=[]
for [ticker,dt] in pairs:
    data.append([df[ticker].loc[dt][f] for f in fields])
    index.append((ticker,dt))

dfResult = pd.DataFrame(data,index=index,columns=fields)

print(dfResult)

With the output:

                                  PX_BID   PX_ASK
(DE0001102325 Govt, 2023-04-03)   99.xxx   99.xxx
(DE0001134922 Govt, 2023-03-15)  102.xxx  103.xxx
(DE0001134922 Govt, 2023-04-19)  102.xxx  102.xxx
(DE0001102358 Govt, 2023-02-14)   98.xxx   98.xxx
(DE0001102366 Govt, 2023-04-12)   97.xxx   97.xxx

Notes: tickers can be repeated with different dates, by using set to only pass distinct tickers to the bdh() call. Depending on the difference between min_date and max_date it may be more efficient to divide up a large number of tickers into smaller chunks with a tighter date range. 25,000 is probably too many to send in one call (you run the risk of the connection timing out), so perhaps start with blocks of 100 isin-date pairs.

DS_London
  • 3,644
  • 1
  • 7
  • 24
  • Thanks for the tip! I ended up batching the requests by date and now the data retrieval process runs much faster. –  Apr 28 '23 at 13:34