0

I recently decided to dip my foot into some datascience and to experiment with pandas and scipy. So I am trying to run scipy's Lomb-Scargle periodogram on some data. Since I have around 7000 datafiles and am trying to find common frequencies among them I am just running the function to get a single periodogram using a concurrent.futures ThreadPoolExecutor and summing up their results. The function loads the file into a dataframe, does some preprocessing and then computes the lombscargle results. The script/program runs fine for a few thousand files, but then I get a really weird ?error? at the point of calling scipy.signal.lombscargle (I tested where this error occurs using print() statements). The error is as follows:

Assertion failed: b != 0 && "divide by zero", file C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-v_zt9njv\overlay\Lib\site-packages\pythran/pythonic/operator_/div.hpp, line 26

Apparently there was a "divide by zero" error somewhere, but otherwise this message doesn't do much good. The interesting thing is, that my PC doesn't have a user called runneradmin. As such this filepath doesn't exist on my machine.

The worst thing about this is, that I can't catch this error. I have tried everything from bare except clauses to np.seterr(all='raise'), nothing works.

Code

from matplotlib import pyplot as plt
import numpy as np
import scipy.signal as spsig
import pandas as pd
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor


freqs = np.linspace(0.1, 157_800_000, 10000)

def lombsc(s: str) -> np.ndarray:
    np.seterr(all='raise')
    try:
        df = pd.read_csv(f'./ohlc/{s}.csv', index_col='date')

              ⋮
        
        return (spsig.lombscargle(df.index, df['fopen_pct_day'], freqs),
            spsig.lombscargle(df.index, df['fvol_pct_day'], freqs))
    except Exception:
        return None, None


if __name__ == "__main__":
    with open("symbols.txt", "r") as f:
        symbols = f.read().split("\n")


    f_op = np.zeros(len(freqs))
    f_vol = np.zeros(len(freqs))

    with ThreadPoolExecutor(max_workers=16) as executor:
        for op, vol in tqdm(executor.map(lombsc, symbols), total=len(symbols)):
            if op is None or vol is None:
                continue
            f_op += op
            f_vol += vol

    fig, axs = plt.subplots(2, sharex=True)

    ind_op = np.argpartition(f_op, -8)[-8:]

               ⋮

What I would guess what happened is this:

Since the errror message shows pythran, which appears to be some kind of ahead-of-time scientific python compiler I would guess that it is included in scipy, where this error message also seems to originate from. Since the user runneradmin smells a lot like the windows environment on GitHub Actions and scipy uses a GitHub Workflow to build and publish their wheels on multiple environments (windows, notably), it seems to me like an error message appeared during the package build and has been hardcoded into the hpp file ever since.

Does anyone know whether I am on the right track with this and should file an issue over at their repository, or is there some fix I am just not aware of?

leonhma
  • 110
  • 9
  • You can't catch a C/C++ `assert`. Python `except` handles Python exceptions, and `np.seterr` sets NumPy's handling for floating-point exceptions (a different thing from Python exceptions). The `assert` macro in C and C++ isn't either of those things. – user2357112 Jun 07 '23 at 17:50
  • Is there any way then to contain this failure to a single execution of this function so that the other (concurrently running) results are not affected? – leonhma Jun 07 '23 at 19:18

0 Answers0