Data which, for me, generates an exception instead of invoking the 'on_bad_lines' handler is at:
https://opencalaccess.org/misc/NAMES_CD.TSV
I have this:
bad_lines = list()
def bad_line_finder(x):
bad_lines.append(str(x))
return None
for file in os.listdir(dir):
bad_lines = list()
try:
for df in pd.read_csv(f"{dir}/{file}",
sep='\t',
on_bad_lines=bad_line_finder,
engine='python',
chunksize=1000):
print(f"\n{target}")
df.info()
print(f"Bad Lines: {bad_lines}")
bad_lines = list()
except:
print("EXCEPTION:")
traceback.print_exc()
and this works great. There are errors in the files and the method handles them so that I can keep track of them. Except, why do i still see this:
EXCEPTION:
Traceback (most recent call last):
File "/home/ray/Projects/opencalaccess-data/import.py", line 41, in <module>
for df in pd.read_csv(f"{dir}/{file}",
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1698, in __next__
return self.get_chunk()
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 250, in read
content = self._get_lines(rows)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 1114, in _get_lines
new_rows.append(next(self.data))
_csv.Error: ' ' expected after '"'
What is the "on_bad_lines" option doing if it does not handle all of the bad lines? Which of them will it handle and which will it not?
This is a government data source. There are format errors in the data that cannot be corrected by the agency, because they constitute the 0fficial record. So, I must fix them myself. But which of them throw exceptions and which do not?