Improve Performance - Append a list instead of a dataframe

Question

I've read here Creating an empty Pandas DataFrame, then filling it? that it is much better to append a list than appending a dataframe.

I have code that opens a csv as a dataframe, enters a loop of 10k+ contracts and wait for data from an API , append this data to a temporary dataframe, and save the end result to a csv. Below is my code. What would be the most efficient way to perform this in python?

Any other improvements not related to the use of dataframe vs list are also welcome.

df = pd.DataFrame()
try:
watchlistdf = pd.read_csv(
    mypath + "//Extracts//Options//Watchlist//volatility-watchlist-summary.csv"
)
except:
watchlistdf = pd.DataFrame()

for index, x in enumerate(tickers):

    print(
        "processing contract:",
        index,
        "of",
        len(tickers),
        "-",
        x.contract.localSymbol,
    )

    impliedVol = None
    delta = None
    undPrice = None

    while x.modelGreeks is None and z < 1000:
        z = z + 1
        ib.sleep()
    if x.modelGreeks is not None:
        impliedVol = x.modelGreeks.impliedVol
        delta = x.modelGreeks.delta
        undPrice = x.modelGreeks.undPrice
    symbol = x.contract.symbol
    localSymbol = x.contract.localSymbol
    strike = x.contract.strike
    right = x.contract.right
    expiration = (x.contract.lastTradeDateOrContractMonth,)
    last = float(x.last)
    close = float(x.close)
    bid = float(x.bid)
    ask = float(x.ask)
    if util.isNan(bid) or util.isNan(ask):
        mid = float("Nan")
    else:
        mid = (bid + ask) / 2
    bidaskspread = ask - bid
    if util.isNan(mid) or util.isNan(last):
        marketmakeredge = float("Nan")
    else:
        marketmakeredge = round(abs(mid - last))
    if util.isNan(last) or util.isNan(mid) or util.isNan(bidaskspread):
        side = ""
    else:
        side = np.where(
            last <= (mid - (0.1 * bidaskspread)),
            "Bid",
            np.where(last >= (mid + (0.1 * bidaskspread)), "Ask", "Mid"),
        )
    df = df.append(
        {
            "rundate": program_start,
            "symbol": symbol,
            "localSymbol": localSymbol,
            "strike": strike,
            "right": right,
            "expiration": expiration,
            "close": close,
            "bid": bid,
            "ask": ask,
            "mid": mid,
            "impliedVol": impliedVol,
            "delta": delta,
            "undPrice": undPrice,
            "bidaskspread": bidaskspread,
            "marketmakeredger": marketmakeredge,
            "side": side,
        },
        ignore_index=True,
    )

watchlistdf = pd.concat([watchlistdf, df])
watchlistdf.to_csv(
mypath + "//Extracts//Options//Watchlist//volatility-watchlist-options-details.csv")

As mentioned in the second answer in https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it, you can store them to a temporary list and then finally create the dataframe at the end — ThePyGuy, Mar 23 '21 at 02:51
Thank you for the quick reply. I'm new to python. I'm not sure how to read a "column/row" csv in a list. I'm also not sure how to create a list where each new variable loop is added to a new "row" within a defined "column". Can you please guide me on that? — nicnad888, Mar 23 '21 at 02:58

score 0 · Answer 1 · answered Mar 24 '21 at 00:55

I ended up using csv reader and dictionaries. I'm running everything in 1 second instead of several minutes.

import csv
with open(mypath + "//Extracts//Options//Watchlist//volatility-watchlist-options-details.csv", 'r') as f:
l = list(csv.reader(f))
mydictdetails = {header: values for header, *values in zip(*l)}

with open(mypath + "//Extracts//Options//Watchlist//volatility-watchlist-summary.csv", 'r') as f:
l = list(csv.reader(f))
mydictsummary = {header: values for header, *values in zip(*l)}


mydicttemp = {}

    start_time = time.time()

for index, x in enumerate(tickers):
    print(
        "processing contract:",
        index,
        "of",
        len(tickers),
        "-",
        x.contract.localSymbol,
    )

    impliedVol = None
    delta = None
    undPrice = None

    while x.modelGreeks is None and z < 10:
        z = z + 1
        ib.sleep()
    if x.modelGreeks is not None:
        impliedVol = x.modelGreeks.impliedVol
        delta = x.modelGreeks.delta
        undPrice = x.modelGreeks.undPrice
        symbol = x.contract.symbol
        localSymbol = x.contract.localSymbol
        strike = x.contract.strike
        right = x.contract.right
        expiration = (x.contract.lastTradeDateOrContractMonth,)
        last = float(x.last)
        close = float(x.close)
        bid = float(x.bid)
        ask = float(x.ask)
        if util.isNan(bid) or util.isNan(ask):
            mid = None
        else:
            mid = (bid + ask) / 2
        bidaskspread = ask - bid
        if util.isNan(mid) or util.isNan(last):
            marketmakeredge = None
        else:
            marketmakeredge = round(abs(mid - last))
        if util.isNan(last) or util.isNan(mid) or util.isNan(bidaskspread):
            side = None
        else:
            side = np.where(
        last <= (mid - (0.1 * bidaskspread)),
        "Bid",
        np.where(last >= (mid + (0.1 * bidaskspread)), "Ask", "Mid"),
        )
        
        mydictdetails["rundate"].append(program_start)
        mydictdetails["symbol"].append(symbol)
        mydictdetails["localSymbol"].append(localSymbol)
        mydictdetails["strike"].append(strike)
        mydictdetails["right"].append(right)
        mydictdetails["expiration"].append(expiration)
        mydictdetails["close"].append(close)
        mydictdetails["bid"].append(bid)
        mydictdetails["ask"].append(ask)
        mydictdetails["mid"].append(mid)
        mydictdetails["impliedVol"].append(impliedVol)
        mydictdetails["delta"].append(delta)
        mydictdetails["undPrice"].append(undPrice)
        mydictdetails["bidaskspread"].append(bidaskspread)
        mydictdetails["marketmakeredger"].append(marketmakeredge)
        mydictdetails["side"].append(side)

        mydictdetails = {**mydictdetails, **mydicttemp}

print(
"--- Processing", len(tickers), "ticker information took %s seconds ---"
% (time.time() - start_time)
)
watchlistdf = pd.DataFrame.from_dict(mydictdetails)
watchlistdf.to_csv(
            mypath + "//Extracts//Options//Watchlist//volatility-watchlist-options-details.csv")

Improve Performance - Append a list instead of a dataframe

1 Answers1