As others have said, CSV files are rectangular- if there is one column in one row, and 50 in another, then it's not a valid CSV. That said, if you have a folder of malformed CSV files you want to load, there are a few things you could do, if we say we must use polars
:
- edit the CSV files, to make them valid - in this case, a reasonably fast option could be to read in all of the rows as a list, add as many commas as is needed to make each row have n columns, where n is the max number of columns in the CSV, and then save the CSV back to disk. Then read in the csv with
polars
- assuming that the format of the csv file is something like:
blah
blah
blah
Date,A
Time,B
you could loop through each line, file the first line with a comma, and then use that as an input to read_csv
(Credit: jqurious)
import polars as pl
path = "example.csv"
with open(path, 'r') as file:
i = 0
while True:
# if there is a comma in line, break
line = file.readline()
if ',' in line:
break
i += 1
df = pl.read_csv(path, skip_rows=i, has_header=False)
- use a different library - I'm not aware of polars being able to read in a CSV file line by line, which is essentially what you're asking to do. You could try switching to another library which has this as an option