I am trying to create a list of the last days of each month for the past n months from the current date but not including current month
I tried different approaches:
def last_n_month_end(n_months):
"""
Returns a list of the last n month end dates
"""
return [datetime.date.today().replace(day=1) - datetime.timedelta(days=1) - datetime.timedelta(days=30*i) for i in range(n_months)]
somehow this partly works if each every month only has 30 days and also not work in databricks pyspark. It returns AttributeError: 'method_descriptor' object has no attribute 'today'
I also tried the approach mentioned in Generate a sequence of the last days of all previous N months with a given month
def previous_month_ends(date, months):
year, month, day = [int(x) for x in date.split('-')]
d = datetime.date(year, month, day)
t = datetime.timedelta(1)
s = datetime.date(year, month, 1)
return [(x - t).strftime('%Y-%m-%d')
for m in range(months - 1, -1, -1)
for x in (datetime.date(s.year, s.month - m, s.day) if s.month > m else \
datetime.date(s.year - 1, s.month - (m - 12), s.day),)]
but I am not getting it correctly.
I also tried:
df = spark.createDataFrame([(1,)],['id'])
days = df.withColumn('last_dates', explode(expr('sequence(last_day(add_months(current_date(),-3)), last_day(add_months(current_date(), -1)), interval 1 month)')))
I got the last three months (Sep, oct, nov), but all of them are the 30th but Oct has Oct 31st. However, it gives me the correct last days when I put more than 3.
What I am trying to get is this: (last days of the last 4 months not including last_day of current_date)
daterange = ['2022-08-31','2022-09-30','2022-10-31','2022-11-30']