2

Please help!

Ideally, I would really like to solve this using formulas only - not VBA or anything I consider 'fancy'.

I work for a program that awards bonuses for continuous engagement. We have three (sometimes more) engagement time periods that could overlap and/or could have spaces of no engagement. The magic figure is 84 days of continuous engagement. We have been manually reviewing each line (hundreds of lines) to see if the time periods add up to 84 days of continuous engagement, with no periods of inactivity.

In the link there is a pic of a summary of what we work with. Row 3 for example, doesn't have 84 days in any of the 3 time periods, but the first 2 time periods combined includes 120 consecutive days. The dates will not appear in date order - e.g. early engagements may be listed in period 3.

enter image description here

Really looking forward to your advice.

Annie

Darren Bartrup-Cook
  • 18,362
  • 1
  • 23
  • 45
Rhiannon
  • 23
  • 3
  • What are the formulas you already tried? – Capt.Krusty Jul 26 '21 at 06:19
  • I don't really know where to start. I was thinking a nested IF statement using some kind of Maximum minus Minimum on the dates, but because the dates are not in date order across the three periods this won't work :-( – Rhiannon Jul 26 '21 at 06:23
  • There are two main ways of doing these - (1) the gap and island one https://stackoverflow.com/questions/53572815/duration-and-idle-time-for-a-server-from-continuous-dates/53579235#53579235 (2) making an array from all of the days between the earliest start date and latest finish date and crossing off the ones that don't fall within a used time period. Can you say what the earliest possible start date and latest possible finish date could be (e.g. are they all in 2021?) – Tom Sharpe Jul 26 '21 at 07:21
  • Do you have Excel 365? – Tom Sharpe Jul 26 '21 at 07:49
  • Hi Tom - yes, I do have Excel 365. Thanks so much for your advice. I'll give it a go this morning to see how it goes. So grateful to you! – Rhiannon Jul 26 '21 at 23:04

2 Answers2

3

Unfortunately Gap and Island seems to be a non-starter, because I don't think you can use it without either VBA or a lot of helper columns, plus the start dates need to be in order. It's a pity, because the longest continuous time on task (AKA largest island) drops out of the VBA version very easily and arguably it's easier to understand than the array formula versions below see this.

Moving on to option 2, if you have Excel 365, you can Use Sequence to generate a list of dates in a certain range, then check that each of them falls in one of the periods of engagement like this:

=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
SUM(--(period1+period2+period3>0)))

assuming that Z1 and Z2 contain the start and end of the range of dates that you're interested in (I've used 1/1/21 and 31/7/21).

If you don't have Excel 365, you can used the Row function to generate the list of dates instead. I suggest using the Name Manager to create a named range Dates:

=INDEX(Sheet1!$A:$A,Sheet1!$Z$1):INDEX(Sheet1!$A:$A,Sheet1!$Z$2)

enter image description here

Then the formula is:

= SUM(--(((ROW(Dates)>=A3) * (ROW(Dates)<=C3)  +( ROW(Dates)>=E3) * (ROW(Dates)<=G3) + (ROW(Dates)>=I3) * (ROW(Dates)<=K3))>0))

You will probably have to enter this using CtrlShiftEnter or use Sumproduct instead of Sum.

enter image description here

EDIT

As @Qualia has perceptively noted, you want the longest time of continuous engagement. This can be found by applying Frequency to the first formula:

=LET(array,SEQUENCE(Z$2-Z$1+1,1,Z$1),
period1,(array>=A3)*(array<=C3),
period2,(array>=E3)*(array<=G3),
period3,(array>=I3)*(array<=K3),
onDays,period1+period2+period3>0,
MAX(FREQUENCY(IF(onDays,array),IF(NOT(onDays),array)))
)

and the non_365 version becomes

=MAX(FREQUENCY(IF((ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3),ROW(Dates)),
IF( NOT(  (ROW(Dates)>=A3)*(ROW(Dates)<=C3)+(ROW(Dates)>=E3)*(ROW(Dates)<=G3)+(ROW(Dates)>=I3)*(ROW(Dates)<=K3) ),ROW(Dates))))

enter image description here

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
  • 1
    If I understand the question correctly, then the user is looking for the **contiguous** engagement. Your formula seems to sum up all days, even if there is a break between the periods. E.g. In row 1, the contiguous section would go from 2021-02-01 to 2021-06-01. – Qualia Communications Jul 26 '21 at 09:04
  • Yes you're right, well spotted. Can still use the same method, probably wrap it in Frequency to get the longest contiguous section. – Tom Sharpe Jul 26 '21 at 09:30
  • Hi Tom - sorry if this is a dumb question, but I just tried your formula and I can't get it to work. Your screen shot is EXACTLY what I am looking for - both the total engagement and the continuous engagement. Does it matter that my data is formatted into a table? – Rhiannon Jul 27 '21 at 01:43
  • Can you tell me which version you've tried (Excel 365 or alternative version) plz, and what happens when you try it (is there an error message?) – Tom Sharpe Jul 27 '21 at 06:47
  • Hi Tom, the version I am using (sorry if I've given the wrong info) is Microsoft Office Professional Plus 2019. I get a #NAME message within the cell. I have a screenshot, but not sure where I can post it in here. – Rhiannon Jul 28 '21 at 00:38
  • I wonder if you've successfully created the 'Dates' named range using the Name manager before entering the formula? Anyway, you should be able to upload the screenshot using Imgur and then copy a link to it either into your question or into a comment. – Tom Sharpe Jul 28 '21 at 06:54
3

@TomSharpe has shown you a method of solving this with formulas. You would have to modify it if you had more than three time periods.

Not sure if you would consider a Power Query solution to be "too fancy", but it does allow for an unlimited number of time periods, laid out as you show in the sample.

With PQ, we

  • construct lists of all the consecutive dates for each pair of start/end
  • combine the lists for each row, removing the duplicates
  • apply a gap and island technique to the resulting date lists for each row
  • count the number of entries for each "island" and return the maximum

Please note: I counted both the start and the end date. In your days columns, you did not (except for one instance). If you want to count both, leave the code as is; if you don't we can make a minor modification

To use Power Query

  • Create a table which excludes that first row of merged cells
  • Rename the table columns in the format I show in the screenshot, since each column header in a table must have a different name.
  • Select some cell in that Data Table
  • Data => Get&Transform => from Table/Range
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to better understand the algorithm

M Code
code edited to Sort the date lists to handle certain cases

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start P1", type datetime}, {"Comment1", type text}, {"End P1", type datetime}, {"Days 1", Int64.Type}, {"Start P2", type datetime}, {"Comment2", type text}, {"End P2", type datetime}, {"Days 2", Int64.Type}, {"Start P3", type datetime}, {"Comment3", type text}, {"End P3", type datetime}, {"Days 3", Int64.Type}}),

//set data types for columns 1/5/9... and 3/7/11/... as date
dtTypes = List.Transform(List.Alternate(Table.ColumnNames(#"Changed Type"),1,1,1), each {_,Date.Type}),
typed = Table.TransformColumnTypes(#"Changed Type",dtTypes),

//add Index column to define row numbers
rowNums = Table.AddIndexColumn(typed,"rowNum",0,1),

//Unpivot except for rowNum column
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"rowNum"}, "Attribute", "Value"),

//split the attribute column to filter on Start/End => just the dates
//then filter and remove the attributes columns
    #"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Attribute.1", "Attribute.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Attribute.2"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Attribute.1] = "End" or [Attribute.1] = "Start")),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Attribute.1"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"Value", type date}, {"rowNum", Int64.Type}}),

//group by row number
//generate date list from each pair of dates
//combine into a single list of dates with no overlapped date ranges for each row
    #"Grouped Rows" = Table.Group(#"Changed Type2", {"rowNum"}, {
        {"dateList", (t)=> List.Sort(
            List.Distinct(
                List.Combine(
                    List.Generate(
                        ()=>[dtList=List.Dates(
                                t[Value]{0},
                                Duration.TotalDays(t[Value]{1}-t[Value]{0})+1 ,
                                #duration(1,0,0,0)),idx=0],
                        each [idx] < Table.RowCount(t),
                        each [dtList=List.Dates(
                                    t[Value]{[idx]+2},
                                    Duration.TotalDays(t[Value]{[idx]+3}-t[Value]{[idx]+2})+1,
                                    #duration(1,0,0,0)),
                                idx=[idx]+2],
                        each [dtList]))))}
            }),

//determine Islands and Gaps
    #"Expanded dateList" = Table.ExpandListColumn(#"Grouped Rows", "dateList"),

//Duplicate the date column and turn it into integers
    #"Duplicated Column" = Table.DuplicateColumn(#"Expanded dateList", "dateList", "dateList - Copy"),
    #"Changed Type3" = Table.TransformColumnTypes(#"Duplicated Column",{{"dateList - Copy", Int64.Type}}),

//add an Index column
//Then subtract the index from the integer date
// if the dates are consecutive the resultant ID column will => the same value, else it will jump
    #"Added Index" = Table.AddIndexColumn(#"Changed Type3", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "ID", each [#"dateList - Copy"]-[Index]),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"dateList - Copy", "Index"}),

//Group by the date ID column and a Count will => the consecutive days
    #"Grouped Rows1" = Table.Group(#"Removed Columns2", {"rowNum", "ID"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Removed Columns3" = Table.RemoveColumns(#"Grouped Rows1",{"ID"}),

//Group by the Row number and return the Maximum Consecutive days
    #"Grouped Rows2" = Table.Group(#"Removed Columns3", {"rowNum"}, {{"Max Consecutive Days", each List.Max([Count]), type number}}),

//combine the Consecutive Days column with original table
    result = Table.Join(rowNums,"rowNum",#"Grouped Rows2","rowNum"),
    #"Removed Columns4" = Table.RemoveColumns(result,{"rowNum"})
in
    #"Removed Columns4"

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Wow, @Ron, you are certainly the go-to guy for Power Query. I hadn't thought of this till now, but I imagine it wouldn't be too hard to implement gap-and-island (my favourite method for this type of problem, obviously) in Power Query. The method originally comes from SQL. I think you would need the equivalent of a Lag function to do it. – Tom Sharpe Jul 26 '21 at 17:36
  • @TomSharpe This is my first time looking at this kind of problem, so I imagine there are more efficient methods. But I just turn the three date ranges into a list of unique dates; then merge them and determine how many sub groups (based on consecutive dates without a gap) are in the merged list; then take the count of each of those sub-groups. Would that be similar to the gap and island analysis? Hmm, I should probably sort the list also. – Ron Rosenfeld Jul 26 '21 at 17:49
  • I'm struggling to find the original reference - this is one fairly early reference, but I think there is a simpler explanation of it somewhere. http://blogs.solidq.com/en/sqlserver/packing-intervals/ More difficult than I remember it being! – Tom Sharpe Jul 26 '21 at 19:23
  • @TomSharpe Complicated, yes. `(=_=)` Here's a [link to an algorithm](https://apexinsights.net/blog/gaps-and-islands-in-power-query) similar to what I did. – Ron Rosenfeld Jul 26 '21 at 21:26
  • Hi Ron - thanks for this! I have never used Power Query before, so really appreciated your step by step instructions :-) I am getting an error of 'The 'increment' argument is out of range. Any ideas why this is happening? – Rhiannon Jul 27 '21 at 02:16
  • @Rhiannon probably some difference between your actual data and the data sample you presented, that I didn't account for. Without more information, hard for me to be more specific – Ron Rosenfeld Jul 27 '21 at 02:36
  • @Rhiannon You can get that error message if an End Date occurs **before** a start date. That would seem to be a data input problem. How do you want to handle that? – Ron Rosenfeld Jul 27 '21 at 10:14
  • @RonRosenfeld - its a manual check of any negative values in our days column that will determine if any of our end dates are before our start dates. I just did some restructuring of my table and IT WORKS!!!! Thank you so so so so much! – Rhiannon Jul 28 '21 at 00:44