1

i have 3 lists with common IDs. I need to group by object in one list, and extract data from other two. Will give example for more understanding

table for groupNames:

| Id | Name    | 
|--------------|
| 1  | Hello   |
| 2  | Hello   |
| 3  | Hey     |
| 4  | Dude    |
| 5  | Dude    |

table for countId:

| Id | whatever | 
|---------------|
| 1  | test0    |
| 1  | test1    |
| 2  | test2    |
| 3  | test3    |
| 3  | test4    |

table for lastTime:

| Id | timestamp  | 
|-----------------|
| 1  | 1636585230 |
| 1  | 1636585250 |
| 2  | 1636585240 |
| 3  | 1636585231 |
| 3  | 1636585230 |
| 5  | 1636585330 |

and I'm expecting result in list like this

| Name    | whateverCnt | lastTimestamp | 
|---------------------------------------|
| Hello   | 3           | 1636585250    |
| Hey     | 2           | 1636585231    |
| Dude    | 0           | 1636585330    |

for now i had something like this, but it doesnt work

            return groupNames
              .GroupBy(x => x.Name)
              .Select(x =>
              {
                  return new myElem
                  {
                      Name = x.Name,
                      lastTimestamp = new DateTimeOffset(lastTime.Where(a => groupNames.Where(d => d.Name == x.Key).Select(d => d.Id).Contains(a.Id)).Max(m => m.timestamp)).ToUnixTimeMilliseconds(),
                      whateverCnt = countId.Where(q => (groupNames.Where(d => d.Name == x.Key).Select(d => d.Id)).ToList().Contains(q.Id)).Count()
                    };
              })
             .ToList();

Many thanks for any advice.

ProgrammingLlama
  • 36,677
  • 7
  • 67
  • 86
  • Why mandate LINQ? – Caius Jard Dec 15 '21 at 13:56
  • As my knowledge it has better performance. This is only example, mine lists are much bigger and different then i used in this example. https://stackoverflow.com/a/47262860/12999914 – Jiří Poštulka Dec 15 '21 at 14:00
  • Odd; I'm of the opinion that LINQ solutions are often less performant than a direct, but more wordy, alternative. If your lists truly are "much bigger" I think I'd consider carefully which parts I deferred (hah) to LINQ – Caius Jard Dec 15 '21 at 14:02
  • LINQ has better or worse performance depending on how you use it and the specific problem you're tackling. – ProgrammingLlama Dec 15 '21 at 14:02
  • When you say "table" what do you mean? Give an example of how this data is housed in the memory of your running program – Caius Jard Dec 15 '21 at 14:04
  • You're not fetching this from a database right? no `IQueryable` or linq 2 sql? – sommmen Dec 15 '21 at 14:39
  • Im getting this data from database, but with different method, im not using linq 2 sql and they are not IQueryable – Jiří Poštulka Dec 16 '21 at 07:43

3 Answers3

1

In your example, the safest would be a list of the last specified object and just LINQ query the other arrays of objects for the same id.

So something like

public IEnumerable<SomeObject> MergeListsById(
  IEnumerable<GroupNames> groupNames,
  IEnumerable<CountId> countIds,
  IEnumerable<LastTime> lastTimes)
{
  IEnumerable<SomeObject> mergedList = new List<SomeObject>();

  groupNames.ForEach(gn => {
    mergedList.Add(new SomeObject {
      Name = gn.Name,
      whateverCnt = countIds.FirstOrDefault(ci => ci.Id == gn.Id)?.whatever,
      lastTimeStamp = lastTimes.LastOrDefault(lt => lt.Id == gn.Id)?.timestamp
    });
  });

  return mergedList;
}

Try it in a Fiddle or throwaway project and tweak it to your needs. A solution in pure LINQ is probably not desired here, for readability and maintainability sake.

And yes, as the comments say do carefully consider whether LINQ is your best option here. While it works, it does not always do better in performance than a "simple" foreach. LINQ's main selling point is and always has been short, one-line querying statements which maintain readability.

thebugsdontwork
  • 401
  • 3
  • 17
1

I think I'd mostly skip LINQ for this

class Thing{
  public string Name {get;set;}
  public int Count {get;set;}
  public long LastTimestamp {get;set;}
}

...

var ids = new Dictionary<int, string>();
var result = new Dictionary<string, Thing>();
foreach(var g in groupNames) {
  ids[g.Id] = g.Name;
  result[g.Name] = new Whatever { Name = n };
}

foreach(var c in counts)
  result[ids[c.Id]].Count++;

foreach(var l in lastTime){
  var t = result[ids[l.Id]];
  if(t.LastTimeStamp < l.Timestamp) t.LastTimeStamp = l.TimeStamp;
}

We start off making two dictionaries (you could ToDictionary this).. If groupNames is already a dictionary that maps id:name then you can skip making the ids dictionary and just use groupNames directly. This gives us fast lookup from ID to Name, but we actually want to colelct results into a name:something mapping, so we make one of those too. doing result[name] = thing always succeeds, even if we've seen name before. We could skip on some object creation with a ContainsKey check here if you want

Then all we need to do is enumerate our other N collections, building the result. The result we want is accessed from result[ids[some_id_value_here]] and it always exists if groupnames id space is complete (we will never have an id in the counts that we do not have in groupNames)

For counts, we don't care for any of the other data; just the presence of the id is enough to increment the count

For dates, it's a simple max algorithm of "if known max is less than new max make known max = new max". If you know your dates list is sorted ascending you can skip that if too..

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
1

Well, having

  List<(int id, string name)> groupNames = new List<(int id, string name)>() {
    ( 1, "Hello"),
    ( 2, "Hello"),
    ( 3, "Hey"),
    ( 4, "Dude"),
    ( 5, "Dude"),
  };

  List<(int id, string comments)> countId = new List<(int id, string comments)>() {
    ( 1  , "test0"),
    ( 1  , "test1"),
    ( 2  , "test2"),
    ( 3  , "test3"),
    ( 3  , "test4"),
  };

  List<(int id, int time)> lastTime = new List<(int id, int time)>() {
    ( 1  , 1636585230 ),
    ( 1  , 1636585250 ),
    ( 2  , 1636585240 ),
    ( 3  , 1636585231 ),
    ( 3  , 1636585230 ),
    ( 5  , 1636585330 ),
  };

you can, technically, use the Linq below:

var result = groupNames
  .GroupBy(item => item.name, item => item.id)
  .Select(group => (Name          : group.Key,
                    whateverCnt   : group
                      .Sum(id => countId.Count(item => item.id == id)),
                    lastTimestamp : lastTime
                      .Where(item => group.Any(g => g == item.id))
                      .Max(item => item.time)));

Let's have a look:

Console.Write(string.Join(Environment.NewLine, result));

Outcome:

(Hello, 3, 1636585250)
(Hey, 2, 1636585231)
(Dude, 0, 1636585330)

But be careful: List<T> (I mean countId and lastTime) are not efficient data structures here. In the Linq query we have to scan them in order to get Sum and Max. If countId and lastTime are long, turn them (by grouping) into Dictionary<int, T> with id being Key

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215