Python Pandas Dataframe dropping rows based on a column containing a character

Question

I have a dataframe and most of the columns 'arr' have a date correctly formatted as

yyyy-mm-dd

A few bad records have a

in them such as 2019/02/10 and I want to drop them.

I tried this:

ttdf = ttdf[ttdf['arr'].map(lambda x: 0 if '/' in x else 1 ) ]

But I got an error message:

KeyError: '[1 1 1 ... 0 0 0] not in index'

Am I on the right track here?

`ttdf.loc[ttdf['arr'].map(lambda x: False if '/' in x else True ) ]` — rafaelc, Aug 09 '18 at 23:59
Can you give a small example data frame and the dataframe you'd expect as a result? — Matt Messersmith, Aug 09 '18 at 23:59
Any row where the date does not contain a hyphen (example 2020-05-19) I want to keep. Any row where the date does contain a hyphen I want to drop. — Mark Ginsburg, Aug 10 '18 at 00:01
RafaelC you are right, if you "answer" I will mark it as completed — Mark Ginsburg, Aug 10 '18 at 00:07

score 2 · Accepted Answer · answered Aug 10 '18 at 00:02

IIUC

df[~df.dates.atype(str).str.contains('/')]

For example

df = pd.DataFrame()
df['dates'] = ['2011-01-20', '2011-01-20', '2011/01/20', '2011-01-20']

    dates
0   2011-01-20
1   2011-01-20
2   2011/01/20
3   2011-01-20

Then

df[~df.dates.str.contains('/')]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20

You can also use map (as you tried), but using bool values rather than int, such that you perform boolean masking

df[df['dates'].map(lambda x: False if '/' in x else True )]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20

However notice that False if '/' in x else True is redundant. This is the same as just not '/' in x

df[df['dates'].map(lambda x: not '/' in x )]

    dates
0   2011-01-20
1   2011-01-20
3   2011-01-20

Python Pandas Dataframe dropping rows based on a column containing a character

1 Answers1