0

I have a csv, and one column is date, format dd/mm/yyyy.

I read it using z=pd.read_csv('property_scrape.csv')

My raw data is:

enter image description here

After I read it in, some of the values are kept in the format I downloaded (dd/mm/yyyy), while somewhere in the middle, the dates are converted to yyyy-mm-dd:

27       01/10/2019
28       01/10/2019
29       01/10/2019
            ...    
21092    2020-08-22
21093    2020-08-22
21094    2020-08-22

Name: Date, Length: 21122, dtype: object

Does anyone know why this happens?

Also, is there a way to ensure that this date column is always read the correct/constant way?

frank
  • 3,036
  • 7
  • 33
  • 65
  • is it possible that the date format in the csv is not consistent? maybe pandas checks and converts the chunk that seem to match some form, converts that and leaves the rest. you could probably write a function to fix that after reading in the data. – sammywemmy Aug 22 '20 at 22:13
  • that was my first thought, but going through the csv, all dates are of that shown in the input style, dd/mm/yyyy. Going further, I found the exact entry the dates are different, and in the csv they are the same format – frank Aug 22 '20 at 22:18
  • 2
    sounds like this could be relevant: https://stackoverflow.com/questions/55309199/pandas-read-csv-can-apply-different-date-formats-within-the-same-column-is-it – dm2 Aug 22 '20 at 22:48

1 Answers1

0

The problem is Pandas samples the first row and thinks it is MM/DD/YYYY instead of DD/MM -- there isn't a real way to know this. Then later when it finds 22, which not a valid MM it defaults to object/string.

You can add flag infer_datetime_format=False and it should read as strings then you can parse it -- you can pass a lambda to read_csv as well, not sure if an easier way to just pass a format string -- see article article dsexchange 34357

Doug F
  • 167
  • 1
  • 6
  • I like the theory, but the value `30/05/2020` is read in and parsed correctly, as is `13/10/2019` – frank Aug 23 '20 at 20:44