20

I'm trying to delete rows of a dataframe based on one date column; [Delivery Date]

I need to delete rows which are older than 6 months old but not equal to the year '1970'.

I've created 2 variables:

from datetime import date, timedelta
sixmonthago = date.today() - timedelta(188)

import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')

but I don't know how to delete rows based on these two variables, using the [Delivery Date] column.

Could anyone provide the correct solution?

runDOSrun
  • 10,359
  • 7
  • 47
  • 57
Colin O'Brien
  • 2,175
  • 5
  • 20
  • 26

2 Answers2

26

You can just filter them out:

df[(df['Delivery Date'].dt.year == 1970) | (df['Delivery Date'] >= sixmonthago)]

This returns all rows where the year is 1970 or the date is less than 6 months.

You can use boolean indexing and pass multiple conditions to filter the df, for multiple conditions you need to use the array operators so | instead of or, and parentheses around the conditions due to operator precedence.

Check the docs for an explanation of boolean indexing

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Thanks.. I'm getting a return error: TypeError: 'Can only use .dt accessor with datetimelike values' - probably needs its own seperate question but does this mean I have to change the [Delivery Date] to datetime? Sorry, I'm quite new to this – Colin O'Brien Feb 20 '15 at 14:35
  • 1
    You mave have strings instead of datetimes, you can convert doing `df['Delivery Date'] = pd.to_datetime(df['Delivery Date'])` – EdChum Feb 20 '15 at 14:37
4

Be sure the calculation itself is accurate for "6 months" prior. You may not want to be hardcoding in 188 days. Not all months are made equally.

from datetime import date
from dateutil.relativedelta import relativedelta

#http://stackoverflow.com/questions/546321/how-do-i-calculate-the-date-six-months-from-the-current-date-using-the-datetime
six_months = date.today() - relativedelta( months = +6 )

Then you can apply the following logic.

import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')

df = df[(df['Delivery Date'].dt.year == nineteen_seventy.tm_year) | (df['Delivery Date'] >= six_months)]

If you truly want to drop sections of the dataframe, you can do the following:

df = df[(df['Delivery Date'].dt.year != nineteen_seventy.tm_year) | (df['Delivery Date'] < six_months)].drop(df.columns)
unique_beast
  • 1,379
  • 2
  • 11
  • 23