48

I'm trying to understand how LINQ can be used to group data by intervals of time; and then ideally aggregate each group.

Finding numerous examples with explicit date ranges, I'm trying to group by periods such as 5-minutes, 1-hour, 1-day.

For example, I have a class that wraps a DateTime with a value:

public class Sample
{
     public DateTime timestamp;
     public double value;
}

These observations are contained as a series in a List collection:

List<Sample> series;

So, to group by hourly periods of time and aggregate value by average, I'm trying to do something like:

var grouped = from s in series
              group s by new TimeSpan(1, 0, 0) into g
              select new { timestamp = g.Key, value = g.Average(s => s.value };

This is fundamentally flawed, as it groups the TimeSpan itself. I can't understand how to use the TimeSpan (or any data type representing an interval) in the query.

Jason Sturges
  • 15,855
  • 14
  • 59
  • 80

9 Answers9

57

You could round the time stamp to the next boundary (i.e. down to the closest 5 minute boundary in the past) and use that as your grouping:

var groups = series.GroupBy(x =>
{
    var stamp = x.timestamp;
    stamp = stamp.AddMinutes(-(stamp.Minute % 5));
    stamp = stamp.AddMilliseconds(-stamp.Millisecond - 1000 * stamp.Second);
    return stamp;
})
.Select(g => new { TimeStamp = g.Key, Value = g.Average(s => s.value) })
.ToList();

Above achieves that by using a modified time stamp in the grouping, which sets the minutes to the previous 5 minute boundary and removes the seconds and milliseconds. The same approach of course can be used for other time periods, i.e. hours and days.

Edit:

Based on this made up sample input:

var series = new List<Sample>();
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(3) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(4) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(5) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(6) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(7) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(15) });

3 groups were produced for me, one with grouping timestamp 3:05, one with 3:10 and one with 3:20 pm (your results may vary based on current time).

BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
  • What's a difference between your new time span and available timespans for items? you just changed the bias. – Lrrr Jan 13 '12 at 19:54
  • @AliAmiri: it groups items that fall into the same 5 minute interval into the same group by returning the same timestamp for all of those items - wasn't that what OP intended? – BrokenGlass Jan 13 '12 at 19:56
  • I don't think so. You just move them to 5 minute before (also i don't know OP want's do what you tried to show or not). – Lrrr Jan 13 '12 at 19:59
  • 1
    @AliAmiri: I tested this and it works fine for grouping - note that it uses a % (modulo) and its NOT just moving the sample to 5 minutes before. – BrokenGlass Jan 13 '12 at 20:04
  • Ops I missed the percent:D I see it as minus:| – Lrrr Jan 13 '12 at 20:06
  • Perfect! Exactly what I was looking for! Thank you! Henk Holterman also provided exactly what I was looking for. – Jason Sturges Jan 13 '12 at 20:22
  • 2
    Thanks (+1), but have found a problem with this, I was finding I had DateTimes that had the same Year, Month, Minute, Second and Millisecond, but had different Ticks. N.B. They came from DateTimeOffsets pulled by EF from a DB. Because of this (sub millisecond differences) I'd suggest using new DateTime() to ensure your DateTimes are indeed considered equal. see http://stackoverflow.com/a/27234349/661584 – MemeDeveloper Dec 01 '14 at 18:35
  • This is genius. Worked perfectly for me trying to group by 15 minutes. I feel a bit stupid though as I'm struggling to group by every 10 seconds, I'm sure it's just a few changes but can't for the life of me work it out - I've got grouping by a minute working, clearly need another coffee to work it out! – jezorama Mar 24 '21 at 09:12
  • Yeah I'm stuck - if anyone can help that would be much appreciated. Trying to do the same but with a grouping of every 10 seconds; i.e. 12h:01m:10s, 12h:01m:20s and so on. So in a 1 hour period I'd expect to see 360 groups of 10 seconds. Many thanks. – jezorama Mar 24 '21 at 09:50
11

I'm very late to the game on this one, but I came accross this while searching for something else, and I thought i had a better way.

series.GroupBy (s => s.timestamp.Ticks / TimeSpan.FromHours(1).Ticks)
        .Select (s => new {
            series = s
            ,timestamp = s.First ().timestamp
            ,average = s.Average (x => x.value )
        }).Dump();

Here is a sample linqpad program so you can validate and test

void Main()
{
    List<Sample> series = new List<Sample>();

    Random random = new Random(DateTime.Now.Millisecond);
    for (DateTime i = DateTime.Now.AddDays(-5); i < DateTime.Now; i += TimeSpan.FromMinutes(1))
    {
        series.Add(new UserQuery.Sample(){ timestamp = i, value = random.NextDouble() * 100 });
    }
    //series.Dump();
    series.GroupBy (s => s.timestamp.Ticks / TimeSpan.FromHours(1).Ticks)
        .Select (s => new {
            series = s
            ,timestamp = s.First ().timestamp
            ,average = s.Average (x => x.value )
        }).Dump();
}

// Define other methods and classes here
public class Sample
{
     public DateTime timestamp;
     public double value;
}
Duane McKinney
  • 397
  • 3
  • 5
2

I'd suggest using new DateTime() to avoid any issues with sub millisecond differences

var versionsGroupedByRoundedTimeAndAuthor = db.Versions.GroupBy(g => 
new
{
                UserID = g.Author.ID,
                Time = RoundUp(g.Timestamp, TimeSpan.FromMinutes(2))
});

With

  private DateTime RoundUp(DateTime dt, TimeSpan d)
        {
            return new DateTime(((dt.Ticks + d.Ticks - 1) / d.Ticks) * d.Ticks);
        }

N.B. I am here grouping by Author.ID as well as the rounded TimeStamp.

RoundUp function taken from @dtb answer here https://stackoverflow.com/a/7029464/661584

Read about how equality down to the millisecond doesn't always mean equality here Why does this unit test fail when testing DateTime equality?

Community
  • 1
  • 1
MemeDeveloper
  • 6,457
  • 2
  • 42
  • 58
2

For grouping by hour you need to group by the hour part of your timestamp which could be done as so:

var groups = from s in series
  let groupKey = new DateTime(s.timestamp.Year, s.timestamp.Month, s.timestamp.Day, s.timestamp.Hour, 0, 0)
  group s by groupKey into g select new
                                      {
                                        TimeStamp = g.Key,
                                        Value = g.Average(a=>a.value)
                                      };
Michael
  • 21
  • 1
1

I improved on BrokenGlass's answer by making it more generic and added safeguards. With his current answer, if you chose an interval of 9, it will not do what you'd expect. The same goes for any number 60 is not divisible by. For this example, I'm using 9 and starting at midnight (0:00).

  • Everything from 0:00 to 0:08.999 will be put into a group of 0:00 as you'd expect. It will keep doing this until you get to the grouping that starts at 0:54.
  • At 0:54, it will only group things from 0:54 to 0:59.999 instead of going up to 01:03.999.

For me, this is a massive issue.

I'm not sure how to fix that, but you can add safeguards.
Changes:

  1. Any minute where 60 % [interval] equals 0 will be an acceptable interval. The if statements below safeguard this.
  2. Hour intervals work as well.

            double minIntervalAsDouble = Convert.ToDouble(minInterval);
            if (minIntervalAsDouble <= 0)
            {
                string message = "minInterval must be a positive number, exiting";
                Log.getInstance().Info(message);
                throw new Exception(message);
            }
            else if (minIntervalAsDouble < 60.0 && 60.0 % minIntervalAsDouble != 0)
            {
                string message = "60 must be divisible by minInterval...exiting";
                Log.getInstance().Info(message);
                throw new Exception(message);
            }
            else if (minIntervalAsDouble >= 60.0 && (24.0 % (minIntervalAsDouble / 60.0)) != 0 && (24.0 % (minIntervalAsDouble / 60.0) != 24.0))
            {
                //hour part must be divisible...
                string message = "If minInterval is greater than 60, 24 must be divisible by minInterval/60 (hour value)...exiting";
                Log.getInstance().Info(message);
                throw new Exception(message);
            }
            var groups = datas.GroupBy(x =>
            {
                if (minInterval < 60)
                {
                    var stamp = x.Created;
                    stamp = stamp.AddMinutes(-(stamp.Minute % minInterval));
                    stamp = stamp.AddMilliseconds(-stamp.Millisecond);
                    stamp = stamp.AddSeconds(-stamp.Second);
                    return stamp;
                }
                else
                {
                    var stamp = x.Created;
                    int hourValue = minInterval / 60;
                    stamp = stamp.AddHours(-(stamp.Hour % hourValue));
                    stamp = stamp.AddMilliseconds(-stamp.Millisecond);
                    stamp = stamp.AddSeconds(-stamp.Second);
                    stamp = stamp.AddMinutes(-stamp.Minute);
                    return stamp;
                }
            }).Select(o => new
            {
                o.Key,
                min = o.Min(f=>f.Created),
                max = o.Max(f=>f.Created),
                o
            }).ToList();
    

Put whatever you'd like in the select statement! I put in min/max because it was easier to test it.

Migit
  • 63
  • 7
0

Even though I am really late, here are my 2 cents:

I wanted to Round() the time values down AND up in 5 minute intervals:

10:31 --> 10:30
10:33 --> 10:35
10:36 --> 10:35

This can be achieved by converting to TimeSpan.Tick and converting back to DateTime and using Math.Round():

public DateTime GetShiftedTimeStamp(DateTime timeStamp, int minutes)
{
    return
        new DateTime(
            Convert.ToInt64(
                Math.Round(timeStamp.Ticks / (decimal)TimeSpan.FromMinutes(minutes).Ticks, 0, MidpointRounding.AwayFromZero)
                    * TimeSpan.FromMinutes(minutes).Ticks));
}

The shiftedTimeStamp can be used in linq grouping as shown above.

Jan
  • 11
  • 1
0

I know this doesn't directly answer the question, but I was googling around looking for a very similar solution to aggregate candle data for stocks / crypto currencies from a smaller minute period to a higher minute period (5, 10, 15, 30). You can't simply go back from the current minute taking X at a time, as the timestamps for the aggregated periods won't be consistent. You also have to watch out that there's enough data at the start and end of the list to populate a full candlestick of the larger period. Given that, the solution I came up with was as follows. (It assumes that the candles for the smaller period, as indicated by rawPeriod, are sorted by ascending Timestamp.)

public class Candle
{
    public long Id { get; set; }
    public Period Period { get; set; }
    public DateTime Timestamp { get; set; }
    public double High { get; set; }
    public double Low { get; set; }
    public double Open { get; set; }
    public double Close { get; set; }
    public double BuyVolume { get; set; }
    public double SellVolume { get; set; }
}

public enum Period
{
    Minute = 1,
    FiveMinutes = 5,
    QuarterOfAnHour = 15,
    HalfAnHour = 30
}

    private List<Candle> AggregateCandlesIntoRequestedTimePeriod(Period rawPeriod, Period requestedPeriod, List<Candle> candles)
    {
        if (rawPeriod != requestedPeriod)
        {
            int rawPeriodDivisor = (int) requestedPeriod;
            candles = candles
                        .GroupBy(g => new { TimeBoundary = new DateTime(g.Timestamp.Year, g.Timestamp.Month, g.Timestamp.Day, g.Timestamp.Hour, (g.Timestamp.Minute / rawPeriodDivisor) * rawPeriodDivisor , 0) })
                        .Where(g => g.Count() == rawPeriodDivisor )
                        .Select(s => new Candle
                        {
                            Period = requestedPeriod,
                            Timestamp = s.Key.TimeBoundary,
                            High = s.Max(z => z.High),
                            Low = s.Min(z => z.Low),
                            Open = s.First().Open,
                            Close = s.Last().Close,
                            BuyVolume = s.Sum(z => z.BuyVolume),
                            SellVolume = s.Sum(z => z.SellVolume),
                        })
                        .OrderBy(o => o.Timestamp)
                        .ToList();
        }

        return candles;
    }
vipes
  • 922
  • 1
  • 9
  • 17
0

A generalised solution:

    static IEnumerable<IGrouping<DateRange, T>> GroupBy<T>(this IOrderedEnumerable<T> enumerable, TimeSpan timeSpan, Func<T, DateTime> predicate)
    {
        Grouping<T> grouping = null;
        foreach (var (a, dt) in from b in enumerable select (b, predicate.Invoke(b)))
        {
            if (grouping == null || dt > grouping.Key.End)
                yield return grouping = new Grouping<T>(new DateRange(dt, dt + timeSpan), a);
            else
                grouping.Add(a);
        }
    }

    class Grouping<T> : IGrouping<DateRange, T>
    {

        readonly List<T> elements = new List<T>();

        public DateRange Key { get; }

        public Grouping(DateRange key) => Key = key;

        public Grouping(DateRange key, T element) : this(key) => Add(element);

        public void Add(T element) => elements.Add(element);

        public IEnumerator<T> GetEnumerator()=> this.elements.GetEnumerator();

        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    }

    class DateRange
    {
    
        public DateRange(DateTime start, DateTime end)
        {
            this.Start = start;
            this.End = end;
        }

        public DateTime Start { get; set; }
        public DateTime End { get; set; }
    }

Test based on question (using AutoFixture library)

     void Test()
    {
        var many = new Fixture().CreateMany<Sample>(100);

        var groups = many.OrderBy(a => a.timestamp).GroupBy(TimeSpan.FromDays(365), a => a.timestamp).Select(a => a.Average(b => b.value)).ToArray();

    }

    public class Sample
    {
        public DateTime timestamp;
        public double value;
    }
Declan Taylor
  • 408
  • 6
  • 8
0

If the BrokenGlass version does not work, then you may also have ticks in the values. Here is a working version:

        var groups = series.GroupBy(x =>
        {
            var stamp = x.timestamp;
            stamp = stamp.AddMinutes(-(stamp.Minute % 5));
            stamp = stamp.AddTicks(-(stamp.Ticks % TimeSpan.TicksPerMinute));
            return stamp;
        })
        .Select(g => new { TimeStamp = g.Key, Value = g.Average(s => s.value) })
        .ToList();