I recently decided to dip my foot into some datascience and to experiment with pandas and scipy. So I am trying to run scipy's Lomb-Scargle periodogram on some data. Since I have around 7000 datafiles and am trying to find common frequencies among them I am just running the function to get a single periodogram using a concurrent.futures
ThreadPoolExecutor
and summing up their results. The function loads the file into a dataframe, does some preprocessing and then computes the lombscargle results. The script/program runs fine for a few thousand files, but then I get a really weird ?error? at the point of calling scipy.signal.lombscargle
(I tested where this error occurs using print()
statements). The error is as follows:
Assertion failed: b != 0 && "divide by zero", file C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-v_zt9njv\overlay\Lib\site-packages\pythran/pythonic/operator_/div.hpp, line 26
Apparently there was a "divide by zero" error somewhere, but otherwise this message doesn't do much good. The interesting thing is, that my PC doesn't have a user called runneradmin
. As such this filepath doesn't exist on my machine.
The worst thing about this is, that I can't catch this error. I have tried everything from bare except clauses to np.seterr(all='raise')
, nothing works.
Code
from matplotlib import pyplot as plt
import numpy as np
import scipy.signal as spsig
import pandas as pd
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
freqs = np.linspace(0.1, 157_800_000, 10000)
def lombsc(s: str) -> np.ndarray:
np.seterr(all='raise')
try:
df = pd.read_csv(f'./ohlc/{s}.csv', index_col='date')
⋮
return (spsig.lombscargle(df.index, df['fopen_pct_day'], freqs),
spsig.lombscargle(df.index, df['fvol_pct_day'], freqs))
except Exception:
return None, None
if __name__ == "__main__":
with open("symbols.txt", "r") as f:
symbols = f.read().split("\n")
f_op = np.zeros(len(freqs))
f_vol = np.zeros(len(freqs))
with ThreadPoolExecutor(max_workers=16) as executor:
for op, vol in tqdm(executor.map(lombsc, symbols), total=len(symbols)):
if op is None or vol is None:
continue
f_op += op
f_vol += vol
fig, axs = plt.subplots(2, sharex=True)
ind_op = np.argpartition(f_op, -8)[-8:]
⋮
What I would guess what happened is this:
Since the errror message shows pythran
, which appears to be some kind of ahead-of-time scientific python compiler I would guess that it is included in scipy, where this error message also seems to originate from. Since the user runneradmin
smells a lot like the windows environment on GitHub Actions and scipy
uses a GitHub Workflow to build and publish their wheels on multiple environments (windows, notably), it seems to me like an error message appeared during the package build and has been hardcoded into the hpp file ever since.
Does anyone know whether I am on the right track with this and should file an issue over at their repository, or is there some fix I am just not aware of?