1

I have a question similar to this SO question except instead of only having numbers and text, my column has numbers, text and dates in them. How do I remove all rows that have values other than numbers and dates?

For example, I have a generic dataframe:

id Nums and Dates
1 40
2 1/1/2021
3 AABBC
4 20
5 1/2/2021

And after removal, it should look like this.

id Nums and Dates
1 40
2 1/1/2021
4 20
5 1/2/2021
jps
  • 20,041
  • 15
  • 75
  • 79
  • Welcome to StackOverflow! Please remember to accept the best solution as answer by clicking the checkmark next to the answer. Thank you! Please see the following for more info: https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – David Erickson Jan 04 '21 at 23:00

3 Answers3

4

You can create two series that checks for dates and numbers with to_datetime and to_numeric and only keep non-null rows with notnull(). Passing errors='coerce' returns null to non-dates / non-numbers:

dates = pd.to_datetime(df['Nums and Dates'], errors='coerce')
nums = pd.to_numeric(df['Nums and Dates'], errors='coerce')
df[(dates.notnull()) | (nums.notnull())]

Out[1]:
   id  Nums and Dates
0   1              40
1   2        1/1/2021
3   4              20
4   5        1/2/2021
David Erickson
  • 16,433
  • 2
  • 19
  • 35
1

Supposing the name of your dataframe as df and the name of your column as Nums and Dates, try this:

not_str_values = [value for value in df if type(value) is not str]

Then:

df = df.loc[df['Nums and Dates'].isin(not_str_values)]
ljuk
  • 701
  • 3
  • 12
1

With contains and regex, You can do in one pass through the column. You can remove the rows which has alphabets.

df[~df['Nums and Dates'].str.contains(r'[A-Za-z]', regex=True)]

id  Nums and Dates
0   1   40
1   2   1/1/2021
3   4   20
4   5   1/2/2021
ggaurav
  • 1,764
  • 1
  • 10
  • 10