-1

How to check date value in pandas dataframe through regex and pick wrong date type values?

date: df

Date_col
22-01-2016
2016-01-22
2016/01/22
2018-12-25 09:27:53
22-Jan-2016
abcd
1203
2072006
20030201

output:

Bad_Date
22-Jan-2016
abcd
1203
2072006
20030201

may be regular expression which allows only numbers,'-' ,'/', ':' would work out this problem.

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
Mohekar
  • 185
  • 1
  • 2
  • 10

2 Answers2

0

One idea could be to see which values pandas.to_datetime rejects:

import pandas as pd

bad_times = []
for i in df.Date_col.values:
    try:
        pd.to_datetime(i)
    except ValueError:
        bad_times.append(i)

ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
0

Use str.contains with regular expressions:

wrong_dates1 = ~df['Date_col'].str.contains('[-/:]')   # rows without the characters -, /, :
wrong_dates2 = df['Date_col'].str.contains('[A-Za-z]') # rows with letters

df[wrong_dates1 | wrong_dates2]
      Date_col
4  22-Jan-2016
5         abcd
6         1203
7      2072006
8     20030201

side note

If you would consider 22-Jan-2016 and 20030201 as valid dates (which they are), you can simply use pd.to_datetime with errors='coerce' which casts invalid dates tot NaT:

bad_dates = pd.to_datetime(df['Date_col'], errors='coerce').isna()
df[bad_dates]
  Date_col
5     abcd
6     1203
7  2072006
Erfan
  • 40,971
  • 8
  • 66
  • 78