1

I have a ~14,000 row dataframe and attempting to fill in some data into a new column by calling an API. The code below retrieves the expected response, however, it seems each iteration waits for a response to go to the next row.

Here is the function:

def market_sector_des(isin):
isin = '/isin/' + isin
return blp.bdp(tickers = isin, flds = ['market_sector_des']).iloc[0]

I am using xbbg to call the Bloomberg API.

The .apply() function returns the expected response,

df['new_column'] = df['ISIN'].apply(market_sector_des)

but each response takes around 2 seconds, which at 14,000 lines is roughly 8 hours.

Is there any way to make this apply function asynchronous so that all requests are sent in parallel? I have seen dask as an alternative, however, I am running into issues using that as well.

2 Answers2

1

If the above is exactly what you want to do, then it can be achieved by creating a column which contains the ticker syntax to be sent, and then pass that column as a series through blpapi

df['ISIN_NEW'] = '/isin/' + df['ISIN']
isin_new = pd.unique(df['ISIN_NEW'].dropna())
mktsec_df = blp.bdp(tickers = isin_new, flds = ['market_sector_des'])

You can then join the newly created df to your existing df, so that you get the figures in columns intact.

newdf = pd.merge(df, mktsec_df, how='left', left_on = 'ISIN_NEW', right_index = True )

This should result in a single call, which would ideally drop the speed to less than a minute. Do let me know if this works out.

x0nar
  • 125
  • 2
  • 10
  • Thanks - it does in fact only result in one call, and the resulting dataframe only has one response from the API, I am working with asyncio right now and should have a solution soon. –  Apr 11 '23 at 15:02
  • 14000 is quite a small number. I never used xbbg (just standard blapi) but I also think it would be a lot faster to simply get an array of ISINs you use to get the field and merge the entire set in one go with your dataset. In fact, instead of spending hours trying to do this you could even do it with standard excel and simply import the sheet into a df. – AKdemy Apr 11 '23 at 22:34
  • 2
    @wesellboxes. Just checking with you. Did the above method work for you? From your reply I understand that the dataframe has only one output, is it? I tested on a list of 10 ISINs, and it gave me the output for all the 10 ISINs. Let me know if anything. – x0nar Apr 12 '23 at 07:44
0

You can use multiprocessing to parallelize API calls. Divide your Series into THREAD chunks then run one process per chunk:

main.py

import multiprocessing as mp
import pandas as pd
import numpy as np
import parallel_tickers

THREADS = mp.cpu_count() - 1

# df = your_dataframe_here
split = np.array_split(df['ISIN'], THREADS)
with mp.Pool(THREADS) as pool:
    data = pool.map(proxy_func, split)

df['new_column'] = pd.concat(data)

parallel_tickers.py

import pandas as pd
from xbbg import blp

def market_sector_des(isin):
    isin = '/isin/' + isin
    return blp.bdp(tickers = isin, flds = ['market_sector_des']).iloc[0]

def proxy_func(sr):
    return pd.Series([market_sector_des(isin) for isin in sr], index=sr.index)

EDIT: use another module for mp functions

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thank you - this hangs on execution, logs point me to this error: AttributeError: Can't get attribute 'proxy_func' on From the docs, it looks like the main function should be set up in a specific way? –  Apr 10 '23 at 20:19
  • Thanks, but still getting the same error with the updated code block - Can't get attribute 'proxy_func' on –  Apr 10 '23 at 20:50
  • Try to move mp functions in a separate module (https://stackoverflow.com/a/42383397/15239951). If think this is the same problem for Dask. – Corralien Apr 10 '23 at 20:57
  • I updated my answer, can you check it please? – Corralien Apr 10 '23 at 21:05