Wrong Dates in Pandas

Question

I am trying to read a csv file which has a Date column. The dates are stored as 8/27/1962, 9/12/1959 and 7/15/1965. When I am using the to_datetime function, the dates being converted to 8/27/2062, 9/12/2059 and 7/15/2065. I am not sure why this is happening. Is it because the year changed or something?

Example:

planets = pd.read_csv('Planets.csv',usecols = ['FirstVisited'])
0    3/29/74
1    8/27/62
2        NaN
3    9/12/59
4    7/15/65
5    12/4/73
6     9/1/79
Name: FirstVisited, dtype: object

pd.to_datetime(planets.FirstVisited)
0   1974-03-29
1   2062-08-27
2          NaT
3   2059-09-12
4   2065-07-15
5   1973-12-04
6   1979-09-01

Check for indexes 1,3 and 4

Can you share a sample of the csv? The dates are parsing fine for me, in the format you desire. — amanb, Jan 02 '20 at 20:35
How do I share? I can put up a sample CSV file here but not sure how to.. — Prince Modi, Jan 02 '20 at 20:36
Does this answer your question? [Pandas to\_datetime changes year unexpectedly](https://stackoverflow.com/questions/55684075/pandas-to-datetime-changes-year-unexpectedly) — amanb, Jan 02 '20 at 21:02
Well we have the same problem and the solution there are specific and not general I think. What if my years are in 1700s or 1800s or something. The solutions might not work in that case. — Prince Modi, Jan 02 '20 at 21:07
@PrinceModi 1700s is not date because it doesn't have day and month. — Marios Nikolaou, Jan 02 '20 at 21:16
You should modify your question based on your requirements. Perhaps, even update the csv data for those sample years. With just two digits in the year, there is no way for the parser to understand which century it could be from. — amanb, Jan 02 '20 at 21:17

Mayowa Ayodele · Answer 1 · 2020-01-02T22:52:46.650

1

This is because most implementations assumes that 00-68 years belong to 2000 and 69-99 belong to 1900. If all the dates are 19xx, perhaps you can add a suffix of '19' to the year part of string before changing to a date

If all the dates are 19xx do


import pandas as pd

planets = {'FirstVisited':['8/2/62', '9/12/59', '9/12/88']}

planets = pd.DataFrame(planets)


planets['FirstVisited'] = planets['FirstVisited'].str[0:-2] + '19' + planets['FirstVisited'].str[-2:]


planets['FirstVisited'] = pd.to_datetime(planets['FirstVisited'], format = "%d/%m/%Y", errors = 'coerce')



print(planets)

edited Jan 02 '20 at 22:52

answered Jan 02 '20 at 21:00

Mayowa Ayodele

549
2
11

Read the question carefully. – Marios Nikolaou Jan 02 '20 at 21:10
I have now amended my solution. Let me know if it helps – Mayowa Ayodele Jan 02 '20 at 22:53

score 1 · Answer 2 · answered Jan 03 '20 at 01:37

Actually it's not about your code! It's the "origin of time" in programming languages (most basically C). The origin of time in C's time.h header is "1970 January 1". That's why you're taking wrong results for dates before then. I recommend you to correct these times manually ... something like:

import pandas
x, y= pandas.readcsv('Planets.csv'), []
for i in x.FirstVisited:
    i= i.split('/')
    i[0], i[1], i[2]= '19'+i[2], i[0], i[1]
    y.append('-'.join(i))
print(y)

Marios Nikolaou · Answer 3 · 2020-01-02T20:57:34.193

0

You can use pandas to_datetime function, with parameter errors='coerce' converts non-dates into NaT null values.Check my answer below.

import pandas as pd

data = {'dates':["8/27/1962", "9/12/1959", "Nan"]}
df = pd.DataFrame(data)

df['dates'] = pd.to_datetime(df.dates,errors='coerce')
#drop Nan from column
df = df.dropna(subset=['dates'])

lst = df['dates'].dt.strftime('%Y-%m-%d')

print(lst)

edited Jan 02 '20 at 20:57

answered Jan 02 '20 at 20:43

Marios Nikolaou

1,326
1
13
24

score 0 · Answer 4 · answered Jan 02 '20 at 23:00

A bit of a brute-force approach, but if you know all dates are 19' hundreds you can do:

import pandas as pd
import datetime

df=pd.DataFrame({"dt": ["8/27/62", "9/12/59", "7/15/65"], "x": list("abc")})

df["dt"]=df["dt"].str.split(r"/").apply(lambda x: datetime.datetime(int(x[2])+1900, int(x[0]), int(x[1])))

Output:

#before:
        dt  x
0  8/27/62  a
1  9/12/59  b
2  7/15/65  c

#after:
          dt  x
0 1962-08-27  a
1 1959-09-12  b
2 1965-07-15  c

Wrong Dates in Pandas

4 Answers4