I have a CSV file that has errors. The most common one is a too early linebreak.
But now I don't know how to remove it ideally. If I read the line by line
with open("test.csv", "r") as reader:
test = reader.read().splitlines()
the wrong structure is already in my variable. Is this still the right approach and do I use a for loop over test and create a copy or can I manipulate directly in the test variable while iterating over it?
I can identify the corrupt lines by the semikolon, some rows end with a ; others start with it. So maybe counting would be an alternative way to solve it?
EDIT: I replaced reader.read().splitlines() with reader.readlines() so I could handle the rows which end with a ;
for line in lines:
if("Foobar" in line):
line = line.replace("Foobar", "")
if(";\n" in line):
line = line.replace(";\n", ";")
The only thing that remains are rows that beginn with a ; Since I need to go back one entry in the list
Example:
Col_a;Col_b;Col_c;Col_d
2021;Foobar;Bla
;Blub
Blub belongs in the row above.