-2

how to get the last day of the month from a given date

Referring to the above repo, I have a similar question I have a data of number of rows 2107260 and number of columns 8, a snapshot of the data: enter image description here

Now the issue is that I am trying to have the Date column look like this:

enter image description here

Now what I have done is:

df2_1=pd.read_csv("C:\\Users\\Chirantan.Gupta\\Desktop\\Poland_Project\\V1_Data_SFS_2017_21.csv",low_memory=False)


import calendar

from datetime import timedelta

from datetime import date

df2_1['Date'] =  pd.to_datetime(df2_1['Date'], format='%Y-%d-%m')

df2_1['Date']= df2_1['Date'].dt.date

for i in range(df2_1.shape[0]):

    df2_1.iloc[i,6]=date(df2_1.iloc[i,6].year + (df2_1.iloc[i,6].month == 12), 
              (df2_1.iloc[i,6].month + 1 if df2_1.iloc[i,6].month < 12 else 1), 1) - timedelta(1)

The issue is this data being huge takes humongous time to run and as a consequence it is infeasible, in my opinion. My question is: Is there any way I can have this thing done without having the for loop as I have used, I am running out of ideas here, if anyone can help please.

Saradamani
  • 180
  • 1
  • 3
  • 18
  • 1
    How long does it take? Do you even need to change the date to the last day of the month? Couldn't the consumer of this CSV simply know what the date column represents (which is the first day of the month rather than the last day of the month). – jarmod Sep 09 '21 at 17:19
  • @jarmod A couple of hours, I started it at 8:40 PM, it is 10:50 still running.. – Saradamani Sep 09 '21 at 17:21
  • The standard approach to this problem is `df2_1['Date'] = df2_1['Date'] + pd.offsets.MonthEnd(0)` or `df2_1['Date'] = pd.to_datetime(df2_1['Date']) + pd.offsets.MonthEnd(0)` if the column is not already of type datetime. – Henry Ecker Sep 09 '21 at 18:04

1 Answers1

2

Repurposing this.

Use the given function:

import datetime

def last_day_of_month(any_day):
    # this will never fail
    # get close to the end of the month for any day, and add 4 days 'over'
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)
    # subtract the number of remaining 'overage' days to get last day of current month, or said programattically said, the previous day of the first of next month
    return next_month - datetime.timedelta(days=next_month.day)

Then after converting the 'Date' column to datetimes, apply last_day_of_month to all items in the Date column like so:

df2_1['Date']= df2_1['Date'].apply(last_day_of_month)
belfner
  • 232
  • 1
  • 8