0

I'm trying to remove all rows from my dataframe for rows that occurred over a year ago. What am I doing wrong, and what is the fastest way to do what I'm trying to do? Here is my code so far:

import pandas as pd
import datetime as dt

xls = pd.ExcelFile('/Users/thomasmurray/Downloads/20210622Tommy Data Dump 2.xlsx')

df2 = pd.read_excel(xls, 'Orders')

present = dt.datetime.now()
past = dt.timedelta(days=365)
year = present - past

for i in df2.DatePurchased:
    if i < year:
        df2.drop(i)
  • Welcome to SO, what's the problem with the code ? PLZ show the output of the code. Is the problem with whitespaces in the path of file? Try this "https://stackoverflow.com/questions/14852140/whitespaces-in-the-path-of-windows-filepath" or "https://stackoverflow.com/questions/36555950/spaces-in-directory-path-python". – Victor Lee Jun 24 '21 at 01:41

1 Answers1

0

Figured it out! Sorry if my question was incoherent, I am new to this site and to coding. This code will remove all rows that contain a date (from a particular column, "DatePurchased") that are over 365 days old:

import pandas as pd
import datetime as dt

xls = pd.ExcelFile('/Users/thomasmurray/Downloads/20210622Tommy Data Dump 2.xlsx')

df2 = pd.read_excel(xls, 'Orders')

present = dt.datetime.now()
past = dt.timedelta(days=365)
year = present - past

df2_cut = df2[df2.DatePurchased > year]