27

I've got a large file with login information for a list of users. The problem is that the file includes other information in the Date column. I would like to remove all rows that are not of type datetime in the Date column. My data resembles

df:

Name Date
name_1 2012-07-12 22:20:00
name_1 2012-07-16 22:19:00
name_1 2013-12-16 17:50:00
name_1 4345 # type = 'int'
... # type = 'float'
name_2 2010-01-11 19:54:00
name_2 2010-02-06 12:10:00
...
name_2 2012-07-18 22:12:00
name_2 4521
...
name_5423 2013-11-23 10:21:00
...
name_5423 7532

I've tried modifying the solution to

Finding non-numeric rows in dataframe in pandas?

Remove rows where column value type is string Pandas

and How-should-I-delete-rows-from-a-DataFrame-in-Python-Pandas

to fit my needs.

The problem is that whenever I attempt the change I either get an error or the entire dataframe gets deleted

Lukasz
  • 2,476
  • 10
  • 41
  • 51

2 Answers2

51

Use pd.to_datetime with parameter errors='coerce' to make non-dates into NaT null values. Then you can drop those rows

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df.dropna(subset=['Date'])

df

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • This is helpful. Incidentally, it's now "errors='ignore' on more modern pandas (I'm on 0.24). Otherwise, thumbs up! – pmueller Apr 23 '19 at 21:06
  • On pandas 1.1.1 it's still "coerce" to get NaN values (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) – Vega Sep 08 '20 at 08:49
1

We could use the same method as @piRSquared to make non-datetime values to NaT values and use notna (i.e. select non-NaN values) to create a boolean Series and filter df:

df = df[pd.to_datetime(df['Date'], errors='coerce').notna()]

Note that unlike @piRSquared's method, it doesn't modify the dtype of "Date" column. Whether it's a desirable behavior or not is up to you.

Output:

         Name                 Date
0      name_1  2012-07-12 22:20:00
1      name_1  2012-07-16 22:19:00
2      name_1  2013-12-16 17:50:00
5      name_2  2010-01-11 19:54:00
6      name_2  2010-02-06 12:10:00
8      name_2  2012-07-18 22:12:00
11  name_5423  2013-11-23 10:21:00