1

I have a dataframe and most of the columns 'arr' have a date correctly formatted as

yyyy-mm-dd

A few bad records have a

/

in them such as 2019/02/10 and I want to drop them.

I tried this:

ttdf = ttdf[ttdf['arr'].map(lambda x: 0 if '/' in x else 1 ) ]

But I got an error message:

KeyError: '[1 1 1 ... 0 0 0] not in index'

Am I on the right track here?

Mark Ginsburg
  • 2,139
  • 4
  • 17
  • 31

1 Answers1

2

IIUC

df[~df.dates.atype(str).str.contains('/')]

For example

df = pd.DataFrame()
df['dates'] = ['2011-01-20', '2011-01-20', '2011/01/20', '2011-01-20']

    dates
0   2011-01-20
1   2011-01-20
2   2011/01/20
3   2011-01-20

Then

df[~df.dates.str.contains('/')]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20

You can also use map (as you tried), but using bool values rather than int, such that you perform boolean masking

df[df['dates'].map(lambda x: False if '/' in x else True )]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20

However notice that False if '/' in x else True is redundant. This is the same as just not '/' in x

df[df['dates'].map(lambda x: not '/' in x )]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20
rafaelc
  • 57,686
  • 15
  • 58
  • 82