1

The python interpreter segfaults when running in a miniconda environment on a fresh install of ubuntu 20.04.2. This seems to happen intermittently, both while running "pip" during the conda setup of an environment and during the execution of code like below.

The segfault always occurs when running the following code, which reads texts from files and tokenizes the result. The segfault location changes from run to run. Also the exact same code can run on another computer with the same conda environment on a ubuntu 18.04.

The core dumps always points to some function in the unicodeobject.c file in python but the exact function changes from crash to crash. At least one crash has a clear dereferenced pointer 0x0 where the "unicode object" should be.

My guess is that something causes the python interpreter to throw away the pointed to unicode object while it is still being worked on causing a segfault. But any bug in the interpreter or NLTK should have been noticed by more users, and I cannot find anyone with similar issues.

Things tried that didn't fix the issue:

  1. Reformatting and reinstalling ubuntu
  2. Switched to ubuntu 18.04 (on this computer, another computer with 18.04 can run the code just fine)
  3. Replacing hardware, to ensure that RAM, or SSD disk isn't broken
  4. Changing to python versions 3.8.6, 3.8.8, 3.9.2
  5. Cloning the conda environment from a working computer to the broken one

Attached is one stacktrace of the fault handler along with it's corresponding core dump stack trace from gdb.

(eo) axel@minimind:~/test$ python tokenizer_mini.py 
2021-03-30 11:10:15.588399: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-30 11:10:15.588426: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Fatal Python error: Segmentation fault

Current thread 0x00007faa73bbe740 (most recent call first):
  File "tokenizer_mini.py", line 36 in preprocess_string
  File "tokenizer_mini.py", line 51 in <module>
Segmentation fault (core dumped)
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  find_maxchar_surrogates (num_surrogates=<synthetic pointer>, maxchar=<synthetic pointer>, 
    end=0x4 <error: Cannot access memory at address 0x4>, begin=0x0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:1703
#3  _PyUnicode_Ready (unicode=0x7f7e4e04d7f0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:1742
#4  0x000055cd65f6df6a in PyUnicode_RichCompare (left=0x7f7e4cf43fb0, right=<optimized out>, op=2)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/unicodeobject.c:11205
#5  0x000055cd6601712a in do_richcompare (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:726
#6  PyObject_RichCompare (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:774
#7  PyObject_RichCompareBool (op=2, w=0x7f7e4e04d7f0, v=0x7f7e4cf43fb0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/object.c:796
#8  list_contains (a=0x7f7e4e04b4c0, el=0x7f7e4cf43fb0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/listobject.c:455
#9  0x000055cd660be41b in PySequence_Contains (ob=0x7f7e4cf43fb0, seq=0x7f7e4e04b4c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/abstract.c:2083
#10 cmp_outcome (w=0x7f7e4e04b4c0, v=0x7f7e4cf43fb0, op=<optimized out>, tstate=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:5082
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:2977
#12 0x000055cd6609f706 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7e4f4d3c40)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:738
#13 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/call.c:284
#14 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Objects/call.c:411
#15 0x000055cd660be54f in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f7f391985b8, callable=0x7f7f39084160)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Include/cpython/abstract.h:115
#16 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55cd66c2e880)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4963
#17 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:3500
#18 0x000055cd6609e503 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7f39198440)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4298
#19 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, 
    argcount=<optimized out>, kwnames=<optimized out>, kwargs=<optimized out>, kwcount=<optimized out>, kwstep=<optimized out>, 
    defs=<optimized out>, defcount=<optimized out>, kwdefs=<optimized out>, closure=<optimized out>, name=<optimized out>, 
    qualname=<optimized out>) at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4298
#20 0x000055cd6609f559 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, 
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:4327
#21 0x000055cd661429ab in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ceval.c:718
#22 0x000055cd66142a43 in run_eval_code_obj (co=0x7f7f3910f240, globals=0x7f7f391fad80, locals=0x7f7f391fad80)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1165
#23 0x000055cd6615c6b3 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f7f391fad80, locals=0x7f7f391fad80, 
    flags=<optimized out>, arena=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1187
--Type <RET> for more, q to quit, c to continue without paging--
#24 0x000055cd661615b2 in pyrun_file (fp=0x55cd66c2cdf0, filename=0x7f7f391bbee0, start=<optimized out>, globals=0x7f7f391fad80, 
    locals=0x7f7f391fad80, closeit=1, flags=0x7ffe3ee6f8e8)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:1084
#25 0x000055cd66161792 in pyrun_simple_file (flags=0x7ffe3ee6f8e8, closeit=1, filename=0x7f7f391bbee0, fp=0x55cd66c2cdf0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:439
#26 PyRun_SimpleFileExFlags (fp=0x55cd66c2cdf0, filename=<optimized out>, closeit=1, flags=0x7ffe3ee6f8e8)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/pythonrun.c:472
#27 0x000055cd66161d0d in pymain_run_file (cf=0x7ffe3ee6f8e8, config=0x55cd66c2da70)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:391
#28 pymain_run_python (exitcode=0x7ffe3ee6f8e0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:616
#29 Py_RunMain () at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:695
#30 0x000055cd66161ec9 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Modules/main.c:1127
#31 0x00007f7f3a3620b3 in __libc_start_main (main=0x55cd65fe3490 <main>, argc=2, argv=0x7ffe3ee6fae8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe3ee6fad8) at ../csu/libc-start.c:308
#32 0x000055cd660d7369 in _start () at /home/conda/feedstock_root/build_artifacts/python-split_1613835706476/work/Python/ast.c:937

The conda environment used is below, using Miniconda3-py38_4.9.2-Linux-x86_64.sh (note that the segfault does sometimes occur during the setup of a conda environment so it's probably not related to the env)

name: eo
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.8.8
  - pip=20.3.1
  - pip:
    - transformers==4.3.2
    - tensorflow_gpu==2.4.0
    - scikit-learn==0.23.2
    - nltk==3.5
    - matplotlib==3.2.1
    - seaborn==0.11.0
    - tensorflow-addons==0.11.2
    - tf-models-official==2.4.0
    - gspread==3.6.0
    - oauth2client==4.1.3
    - ipykernel==5.4.2
    - autopep8==1.5.4
    - torch==1.7.1

The code below consistently reproduces the problem, the files read are simple text files containing unicode text:

from nltk.tokenize import wordpunct_tokenize
from tensorflow.keras.preprocessing.text import Tokenizer
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords
import pickle
from pathlib import Path
import faulthandler
faulthandler.enable()


def load_data(root_path, feature, index):
    feature_root = root_path / feature
    dir1 = str(index // 10_000)
    base_path = feature_root / dir1 / str(index)
    full_path = base_path.with_suffix('.txt')
    data = None
    with open(full_path, 'r', encoding='utf-8') as f:
        data = f.read()
    return data


def preprocess_string(text, stemmer, stop_words):
    word_tokens = wordpunct_tokenize(text.lower())
    alpha_tokens = []
    for w in word_tokens:
        try:
            if (w.isalpha() and w not in stop_words):
                alpha_tokens.append(w)
        except:
            print("Something went wrong when handling the word: ", w)

    clean_tokens = []
    for w in alpha_tokens:
        try:
            word = stemmer.stem(w)
            clean_tokens.append(word)
        except:
            print("Something went wrong when stemming the word: ", w)
            clean_tokens.append(w)
    return clean_tokens


stop_words = stopwords.words('english')
stemmer = SnowballStemmer(language='english')
tokenizer = Tokenizer()

root_path = '/srv/patent/EbbaOtto/E'
for idx in range(0, 57454):
    print(f'Processed {idx}/57454', end='\r')
    desc = str(load_data(Path(root_path), 'clean_description', idx))
    desc = preprocess_string(desc, stemmer, stop_words)
    tokenizer.fit_on_texts([desc])

Axel
  • 21
  • 3
  • 1
    Apologies for posting in the wrong section, however I'm not sure if this is a bug, it seems so incredibly unlikely that nobody else would have the same issue in that case given I don't use anything unusual. Hence the general ask for input or help. Any opinion on where to post this for bug fixing? – Axel Mar 30 '21 at 10:48
  • Crash shows missing CUDA libs. Is CUDA toolkit installed? – merv Mar 30 '21 at 22:55
  • CUDA isn't installed deliberately since I didn't want to involve the gpu until the crashing had been sorted. So right now it is using tensorflow with CPU but not GPU. The warning explicitly mentions to ignore the above line if you are not interested in GPU. – Axel Mar 31 '21 at 06:57
  • But you are installing `tensorflow_gpu` – merv Mar 31 '21 at 07:57
  • 1
    I'm aware of that, but tensorflow will run on the CPU, see this question for description: https://stackoverflow.com/questions/52624703/difference-between-installation-libraries-of-tensorflow-gpu-vs-cpu I have tried with tensorflow=2.4.0 without the GPU same result. This is not the issue here. I have reinstalled conda and tried with most tf versions between nightly and 2.3 with and without gpu parts. – Axel Apr 07 '21 at 12:23
  • 1
    Also note that the error occurs in "pip install" not just when running this code, which also shows that the error is not in tensorflow. Or in how tensorflow is installed. – Axel Apr 07 '21 at 12:24

1 Answers1

1

For the sake of anyone searching for similar issues. This was eventually resolved to be a hardware fault in the CPU. Replacing the CPU with another identically branded one removed the issue. Interestingly the issue was not present on windows computers.

Axel
  • 21
  • 3