2

I'm looking for a function to take 2 dates (admission and discharge) and a financial year and to return the number of days in each month between these too dates.

Financial year goes 1-April -> 31-March

I currently have a solution (below) which is a mess of SPSS and Python, ultimately it will need to be implemented back into SPSS but as a much tidyer Python function, unfortunately this means it can only use standard libraries (not Pandas).

e.g.

+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|    Admission    |    Discharge    |  FY  |  | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar |
+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 01 January 2017 | 05 January 2017 | 1617 |  |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   4 |   0 |   0 |
| 01 January 2017 | 05 June 2017    | 1617 |  |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |  31 |  28 |  31 |
| 01 January 2017 | 05 June 2017    | 1718 |  |  30 |  31 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
| 01 January 2017 | 01 January 2019 | 1718 |  |  30 |  31 |  30 |  31 |  31 |  30 |  31 |  30 |  31 |  31 |  28 |  31 |
+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

Related - How to calculate number of days between two given dates?

Current solution (SPSS code)

 * Count the beddays.
 * Similar method to that used in Care homes.
 * 1) Declare an SPSS macro which will set the beddays for each month.
 * 2) Use python to run the macro with the correct parameters.
 * This means that different month lengths and leap years are handled correctly.
Define !BedDaysPerMonth (Month = !Tokens(1) 
   /MonthNum = !Tokens(1) 
   /DaysInMonth = !Tokens(1) 
   /Year = !Tokens(1))

 * Store the start and end date of the given month.
Compute #StartOfMonth = Date.DMY(1, !MonthNum, !Year).
Compute #EndOfMonth = Date.DMY(!DaysInMonth, !MonthNum, !Year).

 * Create the names of the variables e.g. April_beddays and April_cost.
!Let !BedDays = !Concat(!Month, "_beddays").

 * Create variables for the month.
Numeric !BedDays (F2.0).

 * Go through all possibilities to decide how many days to be allocated.
Do if keydate1_dateformat LE #StartOfMonth.
   Do if keydate2_dateformat GE #EndOfMonth.
      Compute !BedDays = !DaysInMonth.
   Else.
      Compute !BedDays = DateDiff(keydate2_dateformat, #StartOfMonth, "days").
   End If.
Else if keydate1_dateformat LE #EndOfMonth.
   Do if keydate2_dateformat GT #EndOfMonth.
      Compute !BedDays = DateDiff(#EndOfMonth, keydate1_dateformat, "days") + 1.
   Else.
      Compute !BedDays = DateDiff(keydate2_dateformat, keydate1_dateformat, "days").
   End If.
Else.
   Compute !BedDays = 0.
End If.

 * Months after the discharge date will end up with negatives.
If !BedDays < 0 !BedDays = 0.
!EndDefine.

 * This python program will call the macro for each month with the right variables.
 * They will also be in FY order.
Begin Program.
from calendar import month_name, monthrange
from datetime import date
import spss

#Set the financial year, this line reads the first variable ('year')
fin_year = int((int(spss.Cursor().fetchone()[0]) // 100) + 2000)

#This line generates a 'dictionary' which will hold all the info we need for each month
#month_name is a list of all the month names and just needs the number of the month
#(m < 4) + 2015 - This will set the year to be 2015 for April onwards and 2016 other wise
#monthrange takes a year and a month number and returns 2 numbers, the first and last day of the month, we only need the second.
months = {m: [month_name[m], (m < 4) + fin_year, monthrange((m < 4) + fin_year, m)[1]]  for m in range(1,13)}
print(months) #Print to the output window so you can see how it works

#This will make the output look a bit nicer
print("\n\n***This is the syntax that will be run:***")

#This loops over the months above but first sorts them by year, meaning they are in correct FY order
for month in sorted(months.items(), key=lambda x: x[1][1]):
   syntax = "!BedDaysPerMonth Month = " + month[1][0][:3]
   syntax += " MonthNum = " + str(month[0])
   syntax += " DaysInMonth = " + str(month[1][2])
   syntax += " Year = " + str(month[1][1]) + "."

   print(syntax)
   spss.Submit(syntax)
End Program.
Moohan
  • 933
  • 1
  • 9
  • 27
  • Possible duplicate of [How to calculate number of days between two given dates?](https://stackoverflow.com/questions/151199/how-to-calculate-number-of-days-between-two-given-dates) – Jonah Bishop Jan 11 '19 at 15:06
  • Look at the module `datetime` for proper representation of dates, and `dateutil` for robust parsing of strings into dates. – BoarGules Jan 11 '19 at 15:07
  • @Moohan what would be the purpose of the third parameter, the year? The start and end dates seem to include the year already. – Ralf Jan 11 '19 at 15:08
  • 1
    @JonahBishop not a duplicate. OP wants days *in each month*. – jez Jan 11 '19 at 15:09
  • What is the purpose of `fy` if there is already information on what year it is within the `admission` and `discharge` dates? – Endyd Jan 11 '19 at 15:10
  • 2
    What do you want to do if there are two Januarys between the two dates given? End up with Jan: 62 or something like that? Or separate them by year? – Endyd Jan 11 '19 at 15:18
  • @Endyd The purpose of _fy_ is to deal with dates which cross multiple financial years - consider 01-April-2015 -> 10-April-2017 - if fy was 1516 or 1617 all months would have max days, if fy was 1718 April would have 9 and all other months 0. – Moohan Jan 11 '19 at 15:25
  • @jonsharpe currently I have a working solution but it's a mess of SPSS and Python, I wanted to rewite into a neat Python function to make it more maintainable and easy to read - currently writing something up, will share as soon as it does something... if I get that far! – Moohan Jan 11 '19 at 15:27
  • Is April 1st the start of a new financial year? – Endyd Jan 11 '19 at 15:34
  • are you using dates with years in them mixed with the year you want to calculate? I'm confused. – Marcel Wilson Jan 11 '19 at 15:38

5 Answers5

3

The only way I can think of to do this is by looping through each day and parsing the month it belongs to:

import time, collections
SECONDS_PER_DAY = 24 * 60 * 60
def monthlyBedDays(admission, discharge, fy=None):

    start = time.mktime(time.strptime(admission, '%d-%b-%Y'))
    end = time.mktime(time.strptime( discharge, '%d-%b-%Y'))
    if fy is not None:
        fy = str(fy)
        start = max(start, time.mktime(time.strptime('01-Apr-'+fy[:2], '%d-%b-%y')))
        end   = min(end,   time.mktime(time.strptime('31-Mar-'+fy[2:], '%d-%b-%y')))
    days = collections.defaultdict(int)
    for day in range(int(start), int(end) + SECONDS_PER_DAY, SECONDS_PER_DAY):
        day = time.localtime(day)
        key = time.strftime('%Y-%m', day)  # use '%b' to answer the question exactly, but that's not such a good idea
        days[ key ] += 1
    return days

output = monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018")
print(output)
# Prints:
# defaultdict(<class 'int'>, {'2018-01': 31, '2018-02': 28, '2018-03': 31, '2018-04': 25})

print(monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018", fy=1718))
# Prints:
# defaultdict(<class 'int'>, {'2018-01': 31, '2018-02': 28, '2018-03': 31})

print(monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018", fy=1819))
# Prints:
# defaultdict(<class 'int'>, {'2018-04': 25})

Note that the output is a defaultdict such that, if you ask it for the number of days in any month (or for any key at all) that wasn't recorded (for example output['1999-12']) it will return 0. Note also that I've used '%Y-%m' format for the output keys. This makes it much easier to sort the output, and to disambiguate between months that happen to occur in different years, than if you use the type of key you originally asked for ('%b' -> 'Jan').

jez
  • 14,867
  • 5
  • 37
  • 64
  • That's great thanks, yep I think I can handle the _FY_ from that One question, why do you use SECONDS_PER_DAY? Would datetime.timedelta(days=1) (from Ralf's answer be euqivalent? – Moohan Jan 11 '19 at 15:47
  • You could use `datetime` instead of `time`—that would be the more modern approach, but both work and I'm just more familiar with `time`. – jez Jan 11 '19 at 15:48
  • @Moohan now that I understand you're in the UK (and have just learned from wikipedia that the corporate year-end is a few days different from the personal one that I always remember...) I can see that this approach makes the `fy` thing quite easy (see edit). – jez Jan 11 '19 at 15:58
2

First of all I suggest to use datetime.date instances, so you can parse your dates beforhand with something like this:

import datetime
date = datetime.datetime.strptime('17-Jan-2018', '%d-%b-%Y').date()

Then you could use something like this to iterate over the range of dates:

import datetime
import collections

def f(start_date, end_date, fy_str):
    # if the date range falls outside the financial year, cut it off
    fy_start = datetime.date(2000 + int(fy_str[:2]), 4, 1)
    if start_date < fy_start:
        start_date = fy_start
    fy_end = datetime.date(2000 + int(fy_str[2:]), 3, 31)
    if end_date > fy_end:
        end_date = fy_end

    month_dict = collections.defaultdict(int)

    date = start_date
    while date <= end_date:
        # the key holds year and month to make sorting easier
        key = '{}-{:02d}'.format(date.year, date.month)

        month_dict[key] += 1
        date += datetime.timedelta(days=1)

    return month_dict

The usage would be like this:

>>> d1 = datetime.date(2018, 2, 5)
>>> d2 = datetime.date(2019, 1, 17)


>>> r = f(d1, d2, '1718')
>>> for k, v in sorted(r.items()):
...     print(k, v)
2018-02 24
2018-03 31

>>> r = f(d1, d2, '1819')
>>> for k, v in sorted(r.items()):
...     print(k, v)
2018-04 30
2018-05 31
2018-06 30
2018-07 31
2018-08 31
2018-09 30
2018-10 31
2018-11 30
2018-12 31
2019-01 17
Ralf
  • 16,086
  • 4
  • 44
  • 68
1

I think a lot of people's answers were before the OP gave the crucial information of how fy plays a part of the function (edit: A lot of people have read that edit and now their answers are updated as well). OP wants the number of days between admission and discharge that lands within the financial year (1819 being 01-Apr-2018 to 31-Mar-2019). And obviously as everyone knows the number of days needs to be split by the calendar month.

from datetime import datetime, timedelta

# Function taken from https://stackoverflow.com/a/13565185/9462009
def lastDateOfThisMonth(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)

def monthlyBeddays(admission, discharge, fy):
    startFy = datetime.strptime('01-Apr-'+fy[:2], '%d-%b-%y')
    endFy = datetime.strptime('01-Apr-'+fy[2:], '%d-%b-%y')

    admissionDate = datetime.strptime(admission, '%d-%b-%Y')
    dischargeDate = datetime.strptime(discharge, '%d-%b-%Y')


    monthDates = {'Jan':0,'Feb':0,'Mar':0,'Apr':0,'May':0,'Jun':0,'Jul':0,'Aug':0,'Sep':0,'Oct':0,'Nov':0,'Dec':0}

    # if admitted after end of fy or discharged before beginning of fy, zero days counted
    if admissionDate > endFy or dischargeDate < startFy:
        return monthDates

    if admissionDate < startFy:
        # Jump ahead to start at the first day of fy if admission was prior to the beginning of fy
        now = startFy
    else:
        # If admission happened at or after the first day of fy, we begin counting from the admission date
        now = admissionDate

    while True:
        month = datetime.strftime(now,'%b')
        lastDateOfMonth = lastDateOfThisMonth(now)
        if now >= endFy:
            # If now is greater or equal to the first day of the next fy (endFy), we don't care about any of the following dates within the adm/dis date range
            break
        if month == datetime.strftime(dischargeDate,'%b') and datetime.strftime(now, '%Y') == datetime.strftime(dischargeDate, '%Y') and now >= startFy:
            # If we reach the discharge month, we count this month and we're done
            monthDates[month] = (dischargeDate - now).days # not adding one since in your example it seemed like you did not want to count the dischargeDate (Mar:4)
            break
        elif now < startFy:
            # If now is less than the first day of this fy (startFy), we move on from this month to the next month until we reach this fy
            pass
        else:
            # We are within this fy and have not reached the discharge month yet
            monthDates[month] = (lastDateOfMonth - now).days + 1
            month = datetime.strftime(now, '%b')
        now = lastDateOfMonth + timedelta(days=1) # advance to the 1st of the next month

    return monthDates

# Passes all six scenarios

# Scenario #1: admitted before fy, discharged before  fy (didn't stay at all during fy)
print(monthlyBeddays("01-Jan-2018", "30-Mar-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 0, 'May': 0, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #2: admitted before fy, discharged during fy
print(monthlyBeddays("01-Jan-2018", "30-May-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 30, 'May': 29, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #3: admitted during fy, discharged during fy
print(monthlyBeddays("15-Apr-2018", "30-May-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 16, 'May': 29, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #4: admitted during fy, discharged after fy
print(monthlyBeddays("15-Apr-2018", "30-May-2019", '1819')) # {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 16, 'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31, 'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}

# Scenario #5: admitted before fy, discharged after fy (stayed the whole fy)
print(monthlyBeddays("15-Mar-2018", "30-May-2019", '1819')) # {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30, 'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31, 'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}

# Scenario #6: admitted after fy, discharged after fy (didn't stay at all during fy)
print(monthlyBeddays("15-Mar-2018", "30-May-2019", '1718')) # {'Jan': 0, 'Feb': 0, 'Mar': 17, 'Apr': 0, 'May': 0, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}
Endyd
  • 1,249
  • 6
  • 12
0

Here is my proposition of a solution. From what I understand, you want the number of days in each month between the two given dates. I haven't formatted the months (I have left them as numbers), but it should be easy enough for you to do that.

from datetime import date
from calendar import monthrange
from dateutil.relativedelta import *

#start and end dates
d0 = date(2008, 8, 18)
d1 = date(2008, 12, 26)
delta = d1 - d0
delta_days = delta.days #number of days between the two dates

#we create a copy of the start date so we can use it to iterate (so as to not to lose the initial date)
curr_d = d0
while(1):
    #we iterate over each month until we have no days left

    #if theere are more days in delta_days than in the month
    #the number of days in the current month is the maximum number of days in that month
    if delta_days > monthrange(curr_d.year, curr_d.month)[1]:
        number_of_days_in_curr_month = monthrange(curr_d.year, curr_d.month)[1]
        delta_days -= monthrange(curr_d.year, curr_d.month)[1]

    #the delta_days is smaller than the maximum days in the current month
    #the number of days in the current month is thus == to delta_days
    #we exit the while loop here
    else:
        number_of_days_in_curr_month = delta_days
        print('month number: ' + str(curr_d.month) + ', year: ' + str(curr_d.year) + ', days: ' + str(number_of_days_in_curr_month) )
        break
    print('month number: ' + str(curr_d.month) + ', year: ' + str(curr_d.year) + ', days: ' + str(number_of_days_in_curr_month) )

    #we increment the current month
    curr_d = curr_d + relativedelta(months=+1)
Bruno Robert
  • 302
  • 2
  • 8
0

Thanks for all the great answers. I tried implementing some of them back into SPSS but it quickly becomes really complicated and unwieldly trying to pass values between the two...

I did come up with a tidy function for parsing SPSS date variables into Python datetime objects:

from datetime import datetime, timedelta

def SPSS_to_Python_date(date):
    spss_start_date = datetime(1582, 10, 14)
    return (spss_start_date + timedelta(seconds = date))

As to the main problem, after a bit of thought I managed to simplify (I think) and improve the robustness of my original solution.

Define !BedDaysPerMonth (Month_abbr = !Tokens(1) 
    /AdmissionVar = !Default(keydate1_dateformat) !Tokens(1)
    /DischargeVar = !Default(keydate2_dateformat) !Tokens(1)
    /DelayedDischarge = !Default(0) !Tokens(1))

 * Compute the month number from the name abbreviation.
Compute #MonthNum = xdate.Month(Number(!Quote(!Concat(!Month_abbr, "-00")), MOYR6)).

 * Find out which year we need e.g for FY 1718: Apr - Dec = 2018, Jan - Mar = 2018.
Do if (#MonthNum >= 4).
    Compute #Year = !Concat("20", !substr(!Unquote(!Eval(!FY)), 1, 2)).
Else.
    Compute #Year = !Concat("20", !substr(!Unquote(!Eval(!FY)), 3, 2)).
End if.

 * Now we have the year work out the start and end dates for the month.
Compute #StartOfMonth = Date.DMY(1, #MonthNum, #Year).
Compute #EndOfMonth = Date.DMY(1, #MonthNum + 1, #Year) - time.days(1).

 * Set the names of the variable for this month e.g. April_beddays.
 * And then create the variable.
!Let !BedDays = !Concat(!Month_abbr, "_beddays").
Numeric !BedDays (F2.0).

 * Go through all possibilities to decide how many days to be allocated.
Do if !AdmissionVar LE #StartOfMonth.
    Do if !DischargeVar GT #EndOfMonth.
        * They were in hospital throughout this month.
        * This will be the maximum number of days in the month.
        Compute !BedDays = DateDiff(#EndOfMonth, #StartOfMonth, "days") + 1.
    Else if !DischargeVar LE #StartOfMonth.
        * The whole record occurred before the month began.
        Compute !BedDays = 0.
    Else.
        * They were discharged at some point in the month.
        Compute !BedDays = DateDiff(!DischargeVar, #StartOfMonth, "days").
    End If.
 * If we're here they were admitted during the month.
Else if !AdmissionVar LE #EndOfMonth.
    Do if !DischargeVar GT #EndOfMonth.
        Compute !BedDays = DateDiff(#EndOfMonth, !AdmissionVar, "days") + 1.
    Else.
        * Admitted and discharged within this month.
        Compute !BedDays = DateDiff(!DischargeVar, !AdmissionVar, "days").
    End If.
Else.
    * They were admitted in a future month.
    Compute !BedDays = 0.
End If.

 * If we are looking at Delayed Discharge records, we should count the last day and not the first.
 * We achieve this by taking a day from the first month and adding it to the last.
!If (!DelayedDischarge = 1) !Then
    Do if xdate.Month(!AdmissionVar) = xdate.Month(date.MOYR(#MonthNum, #Year))
        and xdate.Year(!AdmissionVar) =  #Year.
        Compute !BedDays = !BedDays - 1.
    End if.

    Do if xdate.Month(!DischargeVar) = xdate.Month(date.MOYR(#MonthNum, #Year))
        and xdate.Year(!DischargeVar) =  #Year.
        Compute !BedDays = !BedDays + 1.
    End if.
!ifEnd.

 * Tidy up the variable.
Variable Width !Beddays (5).
Variable Level !Beddays (Scale).

!EndDefine.

It could then be (optionally) ran using the follwoing bit of Python.

from calendar import month_name
import spss

#Loop through the months by number in FY order
for month in (4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3):
   #To show what is happening print some stuff to the screen
   print(month, month_name[month])

   #Set up the syntax
   syntax = "!BedDaysPerMonth Month_abbr = " + month_name[month][:3]

   #print the syntax to the screen
   print(syntax)

   #run the syntax
   spss.Submit(syntax)
Moohan
  • 933
  • 1
  • 9
  • 27