not sure if this is considered a double post, I apologize if it is.
I am having a bit of problem with using Pandas Error_Bad_Lines. I am currently importing around 500 CSV files into a Dataframe and using Pandas Error_Bad_Lines to remove lines with bad data, this works until it reads a CSV file which has a bad data on line 2 of the file, which causes the script to crash. The bad line can be anywhere else and the code works but for some reason being at Line 2 makes the program crash.
I got around the problem by help from a member, who provided an example of reading line by line. The issue with doing it that way is that it takes forever to process (about 45 minutes to a hour to read and output)
I am not sure if there is a fix for the error_bad_lines ignoring the bad data at Line 2, would appreciate it if someone knows how to fix that.
In the mean time what I am attempting to do is to read the First few lines from the CSV to check if any are bad lines, if bad ignore the line and continue reading the rest of the file with the concat function, how would I include such a statement in the pd.concat line ? or is that not possible.
label1="DATE"
label2="TIME"
label3="In1 "
label4="In2 "
header_label = [label1, label2, label3, label4]
df = pd.concat([pd.read_csv(f, sep=' | |,', names=header_label, engine='python', warn_bad_lines=True, error_bad_lines=False) for f in glob.glob("input/*.csv")],
ignore_index=False)
Input File:
2019/11/13 08:32 10.4,20.4
2019/11/13 13:58 752019/11/13 13:58 .4,123.9
2019/11/13 09:11 10.2,18.5
2019/11/13 09:22 9.5,14.8
2019/11/13 09:23 9.5,14.5
2019/11/13 09:24 9.3,14.6
2019/11/13 09:25 9.5,14.2
2019/11/13 09:26 9.0,13.4
2019/11/13 09:27 9.7,14.9
Current Output:
DATE ... In1
2019/11/13 08:32 10.4 ... NaN
13:58 752019/11/13 ... 123.9
09:09 752019/11/13 ... 123.9
09:11 10.2 ... NaN
09:22 9.5 ... NaN
... ... ... ...
Desired Output
DATE TIME In1 In2
0 2019/11/13 08:32 10.4 20.4
1 2019/11/13 09:11 10.2 18.5
2 2019/11/13 09:22 9.5 14.8
3 2019/11/13 09:23 9.5 14.5
4 2019/11/13 09:24 9.3 14.6
.. ... ... ... ...
I am new to programming and not looking for anyone to write code out for me, just to point me in the right direction. Thank you