2

I'm trying to filter out all rows that hold wage data that were earned before the end date of a training. So basically I only want wages with a wage period greater than or equal to the end date. When I run the python similar to below it is removing too many fields. I'm confused why this isn't working

df = df[df['wage period']>= df['end date']]
participant end date wage period wage
Joe 1/1/2021 10/1/2020 $100
Bob 1/1/2020 7/1/2020 $150
Jess 10/1/2020 4/1/2020 $100
Bob 1/1/2020 1/1/2020 $200
Bob 1/1/2020 10/1/2019 $120
Ben
  • 153
  • 1
  • 2
  • 8
  • 4
    Convert your date columns to actual datetime type with `pd.to_datetime`, see [this](https://stackoverflow.com/questions/17134716/convert-dataframe-column-type-from-string-to-datetime-dd-mm-yyyy-format) question. – Erfan Jan 14 '21 at 13:05
  • I had it as a period type because the data was quarterly. I did convert it to datetime though. Thanks. – Ben Jan 14 '21 at 14:16

1 Answers1

1

May be you should convert the date (eg. '10/1/2020') to datetime, or change the format of the date if the type is String (eg. '2020/10/1').

Try this:

df['wage period'] = pd.to_datetime(df['wage period'], format='%m/%d/%Y')
df['end date'] = pd.to_datetime(df['end date'], format='%m/%d/%Y')
df = df.loc[df['wage period'] >= df['end date'], :]