1

I'm processing a file which has a date coulmn. I want to check that file contains records for the last month's all days.

That is, if I process the September file, I would like to know which contain rows for September 1st -September 30th. Check data of the previous month. Not for current month. How can I do this in Pandas?

    def is_delivery_ready_to_process():

        dateCols = ['Document Date']
        data = pd.read_excel(os.path.join(file_path, f), parse_dates=dateCols,
                                   dayfirst=True, sheet_name='Refined')
.....?
Ratha
  • 9,434
  • 17
  • 85
  • 163

2 Answers2

1

If I understand your question correctly, you want to filter rows that are within a specific month?

First convert your date string to a datetime object and then do the filtering:

import pandas as pd
from datetime import datetime


d = {'date': ["Sep 1 2005", "Sep 1 2005", "Nov 1 2005", "Dec 1 2005", "Apr 1 2005",  "Aug 1 2005", "Aug 1 2005"], 'group': ["A", "A", "B", "B", "C", "C", "C"], "value": [5, 6, 7, 8, 9, 10, 11]}

df = pd.DataFrame(data= d)
df["datetime"] = pd.to_datetime(df['date'], format='%b %d %Y')

startdate = datetime(2005, 9, 1)
enddate = datetime(2005, 10, 1)

df[(df.datetime >= startdate) & (df.datetime < enddate)]

Output:

   date       group value   datetime
0   Sep 1 2005  A   5   2005-09-01
1   Sep 1 2005  A   6   2005-09-01
dgg32
  • 1,409
  • 1
  • 13
  • 33
0

First you have to get the start date and end date of previous month

  1. find today.
  2. use that to find the first day of this month.
  3. use timedelta to get the last day of the previous month.
  4. use replace(day=1) to get first day of the previous month
import datetime

today = datetime.date.today()
previous_month_end = today.replace(day=1) - datetime.timedelta(days=1)
previous_month_start = previous_month_end.replace(day=1)
print(previous_month_start) # 2019-09-01
print(previous_month_end) # 2019-09-30

Next, select dataframe rows between 2 dates using a boolean mask, for example

import pandas as pd
df = pd.DataFrame(columns=['Document Date'],
                  data=[datetime.date(2019, 10, 1),
                        datetime.date(2019, 9, 1),
                        datetime.date(2019, 9, 3)])

df.loc[(df['Document Date'] >= previous_month_start) & (df['Document Date'] <= previous_month_end)]

gives you

  Document Date
1    2019-09-01
2    2019-09-03
henrywongkk
  • 1,840
  • 3
  • 17
  • 26