I am reading in a buch of files from a directory with pandas.read_csv
.
Some files are have flaws such as changing column separators.
This makes creating a well-defined dataframe too difficult.
The affected data lines look like:
11:25;;;;;;;;;;;;;17.67;632.52;
11:30;;;;;;;;;;;;;;;
11:35,,,,,,,,,,,,,,,
11:40,,,,,,,,,,,,,18.18,633.53
I tried to skip these lines with error_bad_lines
Here is how I read in the data:
df = pd.read_csv(file_path,
sep=sep,
skiprows=skiprows,
usecols = usecols,
parse_dates = parse_dates,
error_bad_lines = True,
warn_bad_lines = True,
)
I still receive an error on my date functions and the resulting dataframe looks like:
65 11:10
66 11:15
67 11:20
68 11:25
69 11:30
70 11:35,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
71 11:40,,,,,,,,,,,,,18.18,633.53,11519,18.18,6.0...
72 11:45,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
73 11:50,,,,,,,,,,,,,18.55,626.05,11611,18.55,6.0...
74 11:55,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
I want to achieve an output as for the lines 65-96 above. The lines 70 and following above are wrong.
How can I solve this issue?