the solution offered by anubhava was very useful, in fact the only one that worked from the guides that I found - that means really removed the semicolons reliably in quoted text. However, using it on a 640 kB text file (yes, 640) took like 3 minutes, which was not acceptable even on an oldish i5.
The solution for me was to implement a C++ function:
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
extern "C" // required when using C++ compiler
const char *
erasesemi(char *s)
{
bool WeAreIn = false;
long sl = strlen(s);
char *r = (char*) malloc(sl+1);
strcpy(r, s);
for (long i = 0; (i < (sl - 1)); i++)
{
if (s[i] == '"')
{
WeAreIn = not(WeAreIn);
}
if ((s[i] == ';') & WeAreIn)
{
r[i] = ',';
}
else
{
r[i] = s[i];
}
}
return r;
}
from what I found in the internets, I used this setup.py
from setuptools import setup, Extension
# Compile *mysum.cpp* into a shared library
setup(
# ...
ext_modules=[Extension('erasesemi', ['erasesemi.cpp'],), ],
)
after that you have to run
python3 setup.py build
the appropriate lines in the main code were:
import ctypes
import glob
libfile = glob.glob(
'build/lib.linux-x86_64-3.8/erasesemi.cpython-38-x86_64-linux-gnu.so')[0]
mylib = ctypes.CDLL(libfile)
mylib.erasesemi.restype = ctypes.c_char_p
mylib.erasesemi.argtypes = [ctypes.c_char_p]
..
data3 = mylib.erasesemi(str(data2).encode('latin-1'))
Like this, it produced the desired result in < 1 second. The most tricky part was to find out how to pass strings with german characters to the c++ function. Naturally, you can use any encoding you want.