0

I am trying to read a csv file which has a Date column. The dates are stored as 8/27/1962, 9/12/1959 and 7/15/1965. When I am using the to_datetime function, the dates being converted to 8/27/2062, 9/12/2059 and 7/15/2065. I am not sure why this is happening. Is it because the year changed or something?

Example:

planets = pd.read_csv('Planets.csv',usecols = ['FirstVisited'])
0    3/29/74
1    8/27/62
2        NaN
3    9/12/59
4    7/15/65
5    12/4/73
6     9/1/79
Name: FirstVisited, dtype: object

pd.to_datetime(planets.FirstVisited)
0   1974-03-29
1   2062-08-27
2          NaT
3   2059-09-12
4   2065-07-15
5   1973-12-04
6   1979-09-01

Check for indexes 1,3 and 4

Prince Modi
  • 425
  • 1
  • 4
  • 16

4 Answers4

1

This is because most implementations assumes that 00-68 years belong to 2000 and 69-99 belong to 1900. If all the dates are 19xx, perhaps you can add a suffix of '19' to the year part of string before changing to a date

If all the dates are 19xx do


import pandas as pd

planets = {'FirstVisited':['8/2/62', '9/12/59', '9/12/88']}

planets = pd.DataFrame(planets)


planets['FirstVisited'] = planets['FirstVisited'].str[0:-2] + '19' + planets['FirstVisited'].str[-2:]


planets['FirstVisited'] = pd.to_datetime(planets['FirstVisited'], format = "%d/%m/%Y", errors = 'coerce')



print(planets)


Mayowa Ayodele
  • 549
  • 2
  • 11
1

Actually it's not about your code! It's the "origin of time" in programming languages (most basically C). The origin of time in C's time.h header is "1970 January 1". That's why you're taking wrong results for dates before then. I recommend you to correct these times manually ... something like:

import pandas
x, y= pandas.readcsv('Planets.csv'), []
for i in x.FirstVisited:
    i= i.split('/')
    i[0], i[1], i[2]= '19'+i[2], i[0], i[1]
    y.append('-'.join(i))
print(y)
Ebick
  • 11
  • 3
0

You can use pandas to_datetime function, with parameter errors='coerce' converts non-dates into NaT null values.Check my answer below.

import pandas as pd

data = {'dates':["8/27/1962", "9/12/1959", "Nan"]}
df = pd.DataFrame(data)

df['dates'] = pd.to_datetime(df.dates,errors='coerce')
#drop Nan from column
df = df.dropna(subset=['dates'])

lst = df['dates'].dt.strftime('%Y-%m-%d')

print(lst)
Marios Nikolaou
  • 1,326
  • 1
  • 13
  • 24
0

A bit of a brute-force approach, but if you know all dates are 19' hundreds you can do:

import pandas as pd
import datetime

df=pd.DataFrame({"dt": ["8/27/62", "9/12/59", "7/15/65"], "x": list("abc")})

df["dt"]=df["dt"].str.split(r"/").apply(lambda x: datetime.datetime(int(x[2])+1900, int(x[0]), int(x[1])))

Output:

#before:
        dt  x
0  8/27/62  a
1  9/12/59  b
2  7/15/65  c

#after:
          dt  x
0 1962-08-27  a
1 1959-09-12  b
2 1965-07-15  c
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34