-1

Using C# LINQ, I would like to be able to turn the following:

    [
     { 
       Id: 1, 
       StartDate: '2021-03-10', 
       EndDate: '2021-03-21', 
       Quantity: 1
     },
     { 
       Id: 2, 
       StartDate: '2021-03-10', 
       EndDate: '2021-03-21', 
       Quantity: 1
     },
     {
       Id: 3, 
       StartDate: '2021-03-10', 
       EndDate: '2021-03-23', 
       Quantity: 2
     },
     {
       Id: 4, 
       StartDate: '2021-03-10', 
       EndDate: '2021-03-25', 
       Quantity: 1
      }
    ]

Into this:

    [
      { 
        StartDate: '2021-03-10', 
        EndDate: '2021-03-21', 
        Quantity: 5, 
        Ids: [1, 2, 3, 4] 
      },
      { 
        StartDate: '2021-03-22', 
        EndDate: '2021-03-23', 
        Quantity: 3, 
        Ids: [3, 4] 
      },      { 
        StartDate: '2021-03-24', 
        EndDate: '2021-03-25', 
        Quantity: 1, 
        Ids: [4] 
      }
    ]

In this scenario:

  1. EndDate may change per entry on the input but StartDate will always be the same.
  2. It is possible that two entries may have the same EndDate. In this case, the results will aggregate, with the results showing one entry for those dates and a summed quantity.

Desired solution:

The LINQ would group by unique date range, sum the quantity value and include an array of ids, indicating which date ranges have been covered.

Help is much appreciated.

This question is a step in the right direction but doesn't take care of indicating the date range as demonstrated. Select multiple fields group by and sum

Sparked
  • 844
  • 10
  • 25

4 Answers4

2

These are a couple of LINQ extension methods that are based on the APL scan operator, extended. scan is like Aggregate, but returns the intermediate results. These are a variation that take pairs of items and then processes them.

public static class IEnumerableExt {
    // TRes firstResFn(T firstValue)
    // TRes combineFn(T PrevValue, T CurValue)
    // returns firstResFn(items.First()) then ScanByPairs(items, combineFn)
    public static IEnumerable<TRes> ScanByPairs<T, TRes>(this IEnumerable<T> items, Func<T, TRes> firstResFn, Func<T, T, TRes> combineFn) {
        using (var itemsEnum = items.GetEnumerator())
            if (itemsEnum.MoveNext()) {
                var prev = itemsEnum.Current;

                yield return firstResFn(prev);
                while (itemsEnum.MoveNext())
                    yield return combineFn(prev, prev = itemsEnum.Current);
            }
    }

    // THelper helperSeedFn(T FirstValue)
    // TRes resSeedFn(T FirstValue)
    // (THelper Helper, TRes Res) combineFn((THelper Helper, TRes PrevRes), T CurValue)
    // returns resSeedFn, combineFn,...
    public static IEnumerable<TRes> ScanToPairsWithHelper<T, THelper, TRes>(this IEnumerable<T> items, Func<T, THelper> helperSeedFn, Func<T, TRes> resSeedFn, Func<(THelper Helper, TRes PrevRes), T, (THelper Helper, TRes Res)> combineFn) {
        using (var itemsEnum = items.GetEnumerator())
            if (itemsEnum.MoveNext()) {
                var seed = (Helper: helperSeedFn(itemsEnum.Current), Res: resSeedFn(itemsEnum.Current));

                while (itemsEnum.MoveNext()) {
                    yield return seed.Res;
                    seed = combineFn(seed, itemsEnum.Current);
                }
                yield return seed.Res;
            }
    }

}

First, create an intermediate list that combines any identical date ranges:

var int1 = src.GroupBy(s => s.EndDate)
              .Select(sg => new {
                  Ids = sg.Select(s => s.Id).ToList(),
                  StartDate = sg.First().StartDate,
                  EndDate = sg.Key,
                  Quantity = sg.Sum(s => s.Quantity)
              });

Then use ScanByPairs to compute the new start dates so ranges don't overlap (this assumes EndDate is in increasing order):

var int2 = int1.ScanByPairs(s => new { s.Ids, s.Quantity, s.StartDate, s.EndDate },
                            (prev, next) => new { next.Ids, next.Quantity, StartDate = prev.EndDate.AddDays(1), next.EndDate });

Finally, process the previous result in reverse, aggregating the quantities and IDs as you go:

var ans = int2.Reverse()
              .ScanToPairsWithHelper(first => new { first.Ids, first.Quantity },
                                     first => new { first.Ids, first.Quantity, first.StartDate, first.EndDate },
                                     (helpernext, prev) => (new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity },
                                                            new { Ids = helpernext.Helper.Ids.Concat(prev.Ids).ToList(), Quantity = helpernext.Helper.Quantity + prev.Quantity, prev.StartDate, prev.EndDate }))
              .Reverse();
NetMage
  • 26,163
  • 3
  • 34
  • 55
  • Nice solution, but `GroupBy()` will unnecessarily look at the whole input to build the groups. MoreLINQ's [`GroupAdjacent()`](https://github.com/morelinq/MoreLINQ#groupadjacent) will immediately yield a group as soon as it hits an item with a different key. – Good Night Nerd Pride Dec 14 '20 at 19:21
  • *will unnecessarily create an internal lookup to build the groups. – Good Night Nerd Pride Dec 14 '20 at 19:29
  • 1
    @GoodNightNerdPride I don't like MoreLINQ's attitude toward conflicts with .Net (Core), and prefer a lighter weight solution when it isn't too difficult. Nice idea on the `GroupAdjacent` though I don't like the way they return a `List` instead of an `IGrouping`, so it isn't really `GroupBy` compatible. I have something similar called `GroupByRuns` but it returns an `IGrouping`, which means it is somewhat heavy weight as it requires a class to implement `IGrouping` since the framework `Grouping` class isn't publicly usable. – NetMage Dec 14 '20 at 19:54
  • Yeah those conflicts are really annoying. Importing only the needed function helps though. They even went through the trouble to create nuget packages for individual functions, in case the whole package is too much baggage. – Good Night Nerd Pride Dec 14 '20 at 20:04
2

So given this class:

public class Entry
{
    public int Id { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
    public int Quantity { get; set; }
}

And this starting point:

Entry[] entries = JsonConvert.DeserializeObject<Entry[]>(jsonText);

I first get a list of the distinct dates involved:

DateTime[] dates =
    entries
        .SelectMany(x => new [] { x.StartDate, x.EndDate })
        .Distinct()
        .OrderBy(x => x)
        .ToArray();

Now I can query to split each Entry into the set of distinct date ranges:

var query =
    from e in entries
    let splitters =
        dates
            .Where(x => x >= e.StartDate)
            .Where(x => x <= e.EndDate)
            .ToArray()
    from s in
        splitters
            .Skip(1)
            .Zip(
                splitters,
                (s1, s0) => new Entry()
                {
                    Id = e.Id,
                    StartDate = s0,
                    EndDate = s1,
                    Quantity = e.Quantity
                })
    group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
    select new
    {
        gss.Key.StartDate,
        gss.Key.EndDate,
        Quantity = gss.Sum(gs => gs.Quantity),
        Ids = gss.Select(gs => gs.Id).ToArray(),
    };

That gives me:

results

Or:

[
    {
        "StartDate": "2021-03-10T00:00:00",
        "EndDate": "2021-03-21T00:00:00",
        "Quantity": 5,
        "Ids": [
            1,
            2,
            3,
            4
        ]
    },
    {
        "StartDate": "2021-03-21T00:00:00",
        "EndDate": "2021-03-23T00:00:00",
        "Quantity": 3,
        "Ids": [
            3,
            4
        ]
    },
    {
        "StartDate": "2021-03-23T00:00:00",
        "EndDate": "2021-03-25T00:00:00",
        "Quantity": 1,
        "Ids": [
            4
        ]
    }
]

If I'm allowed to use Microsoft's "System.Interactive" library then it can be done like this:

var query =
    from e in entries
    from s in
        dates
            .Where(x => x >= e.StartDate)
            .Where(x => x <= e.EndDate)
            .Buffer(2, 1)
            .SkipLast(1)
            .Select(x => new Entry()
            {
                Id = e.Id,
                StartDate = x[0],
                EndDate = x[1],
                Quantity = e.Quantity
            })
    group new { s.Id, s.Quantity } by new { s.StartDate, s.EndDate } into gss
    select new
    {
        gss.Key.StartDate,
        gss.Key.EndDate,
        Quantity = gss.Sum(gs => gs.Quantity),
        Ids = gss.Select(gs => gs.Id).ToArray(),
    };
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • That's very impressive. The only place where it doesn't match my expected output is the start date on results after result one. Result 2, Expected: 2021-03-22. Received: 2021-03-21. The new start date should be the day after the day after the previous end date, not the same. – Sparked Dec 15 '20 at 09:44
  • 1
    I'm marking this as the correct answer since other than the tiny quibble noted above, it answers the question exactly and is clever and clearly explained and it deserves to be at the top. – Sparked Dec 23 '20 at 09:03
1

This is rather tough to do with LINQ. I recommend using MoreLINQ for that.

// these helpers make things much easier
using static MoreLinq.Extensions.GroupAdjacentExtension;
using static MoreLinq.Extensions.ScanRightExtension;
using static MoreLinq.Extensions.WindowRightExtension;

// ...

var output = input
    // aggregate identical end dates
    .GroupAdjacent(x => x.EndDate)
    .Select(xs => xs.Aggregate(
        new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
        (acc, curr) => new {
            Ids = acc.Ids.Append(curr.Id),
            Quantity = acc.Quantity + curr.Quantity,
            curr.StartDate,
            curr.EndDate
        }))
    
    // cut off overlapping start dates
    .WindowRight(2)
    .Select(win => win.Count == 1
        ? win[0]
        : new {
            win[1].Ids,
            win[1].Quantity,
            StartDate = win[0].EndDate.AddDays(1),
            win[1].EndDate
        })
    
    // accumulate IDs and quantities
    .ScanRight((curr, prev) => new {
        Ids = curr.Ids.Concat(prev.Ids),
        Quantity = curr.Quantity + prev.Quantity,
        curr.StartDate,
        curr.EndDate
    });

This doesn't iterate multiple times and efficiently streams everything. But its a lot of code.

Working example: https://dotnetfiddle.net/YxDQEX

The main trick is to iterate the input backwards. This let's us easily accumulate the IDs and quantities, because they always grow when scanning the list from back to front. E.g. the second-to-last item's quantity will be its own quantity plus the quantity of the last item; the third-to-last item's quantity will be its own plus the quantity of the second-to-last item; etc.

ScanRight() does the backwards iteration for us. It is similar to Aggregate(), but it yields the intermediate values of the accumulate as well, thus returning an IEnumerable<TAccumulate> rather than a single TAccumulate.

It might be possible to get a cleaner version of this using a traditional foreach loop, but I kind of doubt it. With a foreach you would have to do everything at the same time (i.e. in the same loop body) in order to do it efficiently and that will probably become unreadable as well (considering all the state that has to be managed).

With the LINQ version you at least have three disparate steps that you could extract using extension functions:

public static IEnumerable<T> MergeEqualEndDates<T>(this IEnumerable<T> xs) => xs
    .GroupAdjacent(x => x.EndDate)
    .Select(xs => xs.Aggregate(
        new { Ids = Enumerable.Empty<int>(), Quantity = 0, StartDate = default(DateTime), EndDate = default(DateTime) },
        (acc, curr) => new {
            Ids = acc.Ids.Append(curr.Id),
            Quantity = acc.Quantity + curr.Quantity,
            curr.StartDate,
            curr.EndDate
        }));

Which could leave you with:

var output = input
    .MergeEqualEndDates()
    .TrimStartDates()
    .AccumulateIdsAndQuantities();
Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
1

Thank you to @GoodNightNerdPride and @NetMage for their help.

This is the version I came up with. I cheated and used a for-loop. It is less elegant but I do find it easier to read.

Below is the main extention method (which borrows from @NetMage) and beneath that is an Xunit file I used to test the whole thing.

Extention Method

public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {

        var requestsWithGroupedIds = requests
                        .GroupBy(x => x.EndDate)
                        .Select(x => new {
                            Ids = x.Select(y => y.Id).ToArray(),
                            StartDate = x.First().StartDate,
                            EndDate = x.Key,
                            Quantity = x.Sum(y => y.Quantity)
                        })
                        .OrderBy(x => x.EndDate)
                        .ToList();

        var responses = new List<Response>();
        for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {

            var req = requestsWithGroupedIds[i];

            var overlappingEntries = requestsWithGroupedIds
                .Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
                .ToList();

            var resp = new Response {
                Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
                Quantity = overlappingEntries.Sum(x => x.Quantity),
                StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
                EndDate = req.EndDate
            };

            responses.Add(resp);
        }
        return responses;
    }

XUnit Code

using System;
using System.Linq;
using Xunit;
using System.Collections.Generic;

namespace StackOverflow.Tests {

    public class GroupingAndAggregationTests {

        [Fact]
        void ShouldAggregateDuplicateDataIntoSingle() {

            var requests = new List<Request>() {
                new Request{Id = 9, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 6 },
                new Request{Id = 345, StartDate = new DateTime(2021, 2, 20), EndDate = new DateTime(2021,3, 5), Quantity = 29 }
            };

            var responses = requests.SumBasedUponOverlappingDates();

            Assert.Equal(1, responses.Count());

            var expectedResponse =
                new Response() {
                    StartDate = new DateTime(2021, 2, 20),
                    EndDate = new DateTime(2021, 3, 5),
                    Quantity = 35,
                    Ids = new int[] { 9, 345 }
                };
            var actualResponse = responses[0];
            Assert.True(actualResponse.IsEqual(expectedResponse));
        }

        [Fact]
        void ShouldAggregateMultipleBasedOnOverlappingDates() {

            var requests = new List<Request>() {
                new Request{Id = 1, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
                new Request{Id = 2, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 21), Quantity = 1 },
                new Request{Id = 3, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 23), Quantity = 2 },
                new Request{Id = 4, StartDate = new DateTime(2021,3,10), EndDate = new DateTime(2021,3, 25), Quantity = 1 }
            };

            var responses = requests.SumBasedUponOverlappingDates();

            Assert.Equal(3, responses.Count());

            var expecedResponse1 =
                new Response() {
                    StartDate = new DateTime(2021, 3, 10),
                    EndDate = new DateTime(2021, 3, 21),
                    Quantity = 5,
                    Ids = new int[] { 1, 2, 3, 4 }
                };
            var actualResponse1 = responses[0];
            Assert.True(actualResponse1.IsEqual(expecedResponse1));

            var expectedResponse2 =
                new Response() {
                    StartDate = new DateTime(2021, 3, 22),
                    EndDate = new DateTime(2021, 3, 23),
                    Quantity = 3,
                    Ids = new int[] { 3, 4 }
                };
            var actualResponse2 = responses[1];
            Assert.True(actualResponse2.IsEqual(expectedResponse2));

            var expectedResponse3 =
                new Response() {
                    StartDate = new DateTime(2021, 3, 24),
                    EndDate = new DateTime(2021, 3, 25),
                    Quantity = 1,
                    Ids = new int[] { 4 }
                };
            var actualResponse3 = responses[2];
            Assert.True(actualResponse3.IsEqual(expectedResponse3));


        }


    }

    public class Request {
        public int Id { get; set; }
        public DateTime StartDate { get; set; }
        public DateTime EndDate { get; set; }
        public int Quantity { get; set; }
    }

    public class Response {
        public DateTime StartDate { get; set; }
        public DateTime EndDate { get; set; }
        public int Quantity { get; set; }
        public int[] Ids { get; set; }

        public bool IsEqual(Response resp)
            =>
            StartDate == resp.StartDate &&
            EndDate == resp.EndDate &&
            Quantity == resp.Quantity &&
            Ids.OrderBy(x => x).SequenceEqual(resp.Ids.OrderBy(x => x));
    }

    public static class ExtentionMethods {

        public static List<Response> SumBasedUponOverlappingDates(this List<Request> requests) {

            var requestsWithGroupedIds = requests
                            .GroupBy(x => x.EndDate)
                            .Select(x => new {
                                Ids = x.Select(y => y.Id).ToArray(),
                                StartDate = x.First().StartDate,
                                EndDate = x.Key,
                                Quantity = x.Sum(y => y.Quantity)
                            })
                            .OrderBy(x => x.EndDate)
                            .ToList();

            var responses = new List<Response>();
            for (int i = 0; i < requestsWithGroupedIds.Count(); i++) {

                var req = requestsWithGroupedIds[i];

                var overlappingEntries = requestsWithGroupedIds
                    .Where(x => x.StartDate <= req.StartDate && x.EndDate >= req.EndDate)
                    .ToList();

                var resp = new Response {
                    Ids = overlappingEntries.SelectMany(x => x.Ids.Select(y => y)).OrderBy(x => x).ToArray(),
                    Quantity = overlappingEntries.Sum(x => x.Quantity),
                    StartDate = (i == 0) ? req.StartDate : requestsWithGroupedIds[i - 1].EndDate.AddDays(1),
                    EndDate = req.EndDate
                };

                responses.Add(resp);
            }
            return responses;
        }
    }

}
Sparked
  • 844
  • 10
  • 25