In trying to process a large number of bank account statements given in CSV format I realized that some of the dates are incorrect (30th of February, which is not possible).
So this snippet fails [1] telling me that some dates are incorrect:
df_from_csv = pd.read_csv( csv_filename
, encoding='cp1252'
, sep=";"
, thousands='.', decimal=","
, dayfirst=True
, parse_dates=['Buchungstag', 'Wertstellung']
)
I could of course pre-process those CSV files and replace the 30th of Feb with 28th of Feb (or whatever the Feb ended in that year).
But is there a way to do this in Pandas, while importing? Like "If this column fails, set it to X"?
Sample row
775945;28.02.2018;30.02.2018;;901;"Zinsen"
As you can see, the date 30.02.2018
is not correct, because there ain't no 30th of Feb. But this seems to be a known problem in Germany. See [2].
[1] Here's the error message:
ValueError: day is out of range for month