0

I have a table with the following relevant columns:

  • Machine [Key]
  • Sample
  • Experiment
  • DateCompleted
  • ... some test values on which I perform some calculation

Structure

Machine - Sample - Experiment - Completed - ...
m1      - s1     - e1         - <date>    - ...
m1      - s1     - e2         - <date>    - ...
m1      - s2     - e1         - <date>    - ...
....
m2      - s3     - e1         - <date>    - ...
....

On each machine, multiple experiments can be performed on a single sample.

My goal is to identify 5 latest -distinct- samples of each machine, and get all associated entries (entire rows, including all experiments per this sample).

I can't seem to find the next step once I group by Machine and order by DateCompleted descending.

I guess some sort of "DistinctBy (x => x.Sample)" is needed, but didn't manage to work it out.

Example: (say ordered by date descending already)

Machine - Sample - Experiment
m1 - s1 - e1 * 
m1 - s1 - e2 *
m1 - s2 - e1 *
m1 - s2 - e2 *
m1 - s3 - e1 *
m1 - s4 - e1 *
m1 - s4 - e2 *
m1 - s5 - e1 *
m1 - s6 - e1
m1 - s6 - e2
...

I'd need all the lines marked "*" as output of my query - for each machine so.

I am struggling with writing even the SQL statement for this. If you know how would you write this in SQL, post it, I might get something out of it too.

EDIT:

Ok I tried going once again from bottom-up, and my first attempt was to identify the latest 5 samples.

The following query, with fixed value, works

var samples = (from c in db.Experiments
           where c.Machine == "m1" 
           orderby c.Completed descending
           select c.Sample).ToList().Distinct().Take(5)

I needed to add the "ToList()" because Distinct() messes it up otherwise.

Now, when I include this into another query, where I want the result per each machine - it does not work - it won't order it by descending Completed dates, but keeps the "random" order.

Why is that so?

var last5samples = (from t in db.Experiments
                    group t by new { t.Machine } into g
                    select new
                    {
                        Machine = g.Key.Machine,
                        Samples = (from c in db.Experiments
                                   where c.Machine == g.Key.Machine 
                                   orderby c.Completed descending
                                   select c.Sample).ToList().Distinct().Take(5)
                    });

EDIT 2:

Tried yet another approach, to enlist indeed what I need - all rows containing last 5 samples. This is quite slow, but "works", sort of, except for orderby descending not working.

I went with the "IN" approach first, but then saw that with LINQ I need to inverse the logic, and this is what I came up with:

var last5samples = from t in db.Experiments
                   where (from c in db.Experiments
                          where c.Machine == t.Machine
                          orderby c.Completed descending
                          select c.Sample).ToList().Distinct().Take(5)
                   .Contains(t.Sample)
                   select t;

My main issue now is how to make orderby descending work as expected.

I don't see why it works when it is a solo query, and when a subquery, it doesn't.

dbu
  • 29
  • 4

3 Answers3

0

LINQ doesn't have DistinctBy functionality, but fortunately there's a library called morelinq that allows you to do it.

var results = query.DistinctBy(x => x.Sample).ToList();
markpsmith
  • 4,860
  • 2
  • 33
  • 62
0
using System;
using System.Linq;
using System.Collections.Generic;


public class Program
{
    public void Main()
    {

        var list = new List<Test>();
        list.Add(new Test { Machine = "m1", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now.AddDays(-2) });
        list.Add(new Test { Machine = "m1", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now.AddDays(-1) });
        list.Add(new Test { Machine = "m1", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now });
        list.Add(new Test { Machine = "m2", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now.AddDays(-2) });
        list.Add(new Test { Machine = "m2", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now.AddDays(-1) });
        list.Add(new Test { Machine = "m2", Sample = "s1", Experiment = "e1", DateCompleted = DateTime.Now.AddHours(-1) });

        var q = from s in list
                group s by new { s.Machine, s.Sample }
                    into gs
                    select new
                    {
                        Machine = gs.Key.Machine
                        ,
                        Sample = gs.Key.Sample
                        ,
                        Rows = gs.OrderByDescending(f => f.DateCompleted).Take(5)
                    };
        foreach (var p in q.ToList())
        {
            Console.WriteLine("{0} {1} {2}", p.Machine, p.Sample, p.Rows.Count());
        }
    }
}



public class Test
{
    public string Machine { get; set; }
    public string Sample { get; set; }
    public string Experiment { get; set; }
    public DateTime DateCompleted { get; set; }
}
hazimdikenli
  • 5,709
  • 8
  • 37
  • 67
  • I am not sure if you needed to group by machine + experiment , or just by experiment. – hazimdikenli Dec 03 '14 at 11:40
  • I don't think this does what I need. See the added example in the original post. I need, for each machine, all the rows that belong to the latest 5 samples. – dbu Dec 03 '14 at 12:20
  • @dbu you can play with this code, change the grouping etc. I have also changed the query to take the top 5 samples per machine and sample. – hazimdikenli Dec 03 '14 at 12:28
0

OK this is how I solved it.

After trying whatever I could find, and not understanding why distinct + orderby works in one case ("solo" query) and not another (subquery), I went on to do it "my way".

It also works fast enough (faster than LINQ I wrote latest, in EDIT 2).

Here it is:

  1. I get all distinct machines
  2. For each machine I fetch its latest 5 samples
  3. For each machine I fetch the rows containing latest 5 samples

    foreach (var m in machines)
    {
    
    var last5Samples = (from t in db.Experiments
                    where t.Machine = m.Machine
                    orderby t.Completed descending
                    select t.Sample).ToList().Distinct().Take(5);
    
    var expsForLast5Samples = from t in db.Experiments
                          where last5Samples.Contains(t.Sample)
                          select t;
    }
    
dbu
  • 29
  • 4