0

I have many Lists<(int item1, int item2)> List1....10, all of them contains 10 elements.

Lists are combined into the single List<List<(int item1, int item2)>> listofLists

how do I sum all of the lists by the same index to get a single List<(int item1, int item2)> using linq?

I made a method with 2 loops to do it:

List<(int length, int count)> result = new();

int column = 0;                

while (column < averages[0].Count) // averages is list of lists
{
    (int length, int count) cumulative = (0, 0);

    for (int i = 0; i < averages.Count; i++)
    {
        cumulative.length += averages[i].ElementAt(column).length;
        cumulative.count += averages[i].ElementAt(column).count;
    }
    result.Add(cumulative);
    column++;
}

But wouldn't linq here be much better and faster?

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
Humble Newbie
  • 98
  • 1
  • 11
  • 1
    Why do you think LINQ makes your code better or even faster? It's just another *syntax* for querying, no magic "make this fast"-button. From my perspective your code seems pretty straight-forward (despite all the tuples-stuff, but that's faily opinion-based). Why overcomplicate things? – MakePeaceGreatAgain Feb 27 '23 at 16:13
  • I wouldn't worry about "better and faster" too much. For the most part, just focus on readability. If using LINQ makes your code easier to understand, then great, use that. If it's easy to understand this way, then there's no need to change it. Of course performance does matter in some capacity, but in most cases it's not worth sacrificing readability for a very slight performance boost (no, LINQ does not inherently boost performance, I'm just speaking generally here). – Jesse Feb 27 '23 at 17:41
  • 1
    @MakePeaceGreatAgain Used properly, linq very often helps reduce memory. Sadly, it's often used improperly (by pointlessly calling `.ToList()` when it isn't needed). It can help performance in other ways, too, (making it easier to get earlier loop exits is one example), but that is questionable because for those cases it was always also possible to do just as well w/ imperative code, and there are cases when linq could possibly make things worse. Part of the point of linq is helping you use good habits here, so I'd argue it's a strong net win on average, but comparing apples to apples is hard. – Joel Coehoorn Feb 27 '23 at 18:38

4 Answers4

2

We can do this as a one-liner:

var result = averages.Select(lst =>  lst.Aggregate( (a, i) => (a.Item1 + i.Item1, a.Item2 + i.Item2) ) );

But since nested lambdas can be hard to read, for sanity's sake we should spread it out a bit:

var result = averages.Select(lst =>  
      lst.Aggregate( (a, c) => (a.Item1 + c.Item1, a.Item2 + c.Item2) ) 
);

See it work here:

https://dotnetfiddle.net/PvF4Pw

Note this does not currently produce a list: it's an enumerable, which for both performance and semantic reasons is usually BETTER. But if you really need a List (again: you can probably get by without this and be better off) you can put a .ToList() at the end of the line:

var result = averages.Select(lst =>  
      lst.Aggregate( (a, i) => (a.Item1 + i.Item1, a.Item2 + i.Item2) ) 
).ToList();
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • "enumerable, which for both performance and semantic reasons is usually BETTER" Eeeehm, generally no. Enumerable isn't about performance at all. I know what you *want* to say (deferred execution), but you didn't, so this is generally not true. And there's no general way of determining a "better semantics". – MakePeaceGreatAgain Feb 28 '23 at 07:41
  • @MakePeaceGreatAgain Also memory use. Done right, you can generally use it to keep one record in time instead of the entire set. – Joel Coehoorn Feb 28 '23 at 16:20
1

I suggest to remove the two times execution of ElementAt, that makes your code faster.

List<(int length, int count)> result = new();
int column = 0;

while (column < averages[0].Count) // averages is list of lists
{
    (int length, int count) cumulative = (0, 0);

    for (int i = 0; i < averages.Count; i++)
    {
        var elm = averages[i].ElementAt(column);
        cumulative.length += elm.length;
        cumulative.count += elm.count;
    }
    result.Add(cumulative);
    column++;
}

It will be more readable and maintainable code if you use LINQ

List<(int item1, int item2)> averages1 = new List<(int item1, int item2)>();//10 elements
List<List<(int length, int count)>> averages = new();

List<Tuple<int, int>> result = averages
                                .Select(x => 
                                    new Tuple<int, int>(
                                        x.Select(y => y.length).Sum(), 
                                        x.Select(y => y.count).Sum())
                                 ).ToList();

Someone may think it be much more faster if you use parallel programming

ConcurrentBag<Tuple<int, int>> result= new ConcurrentBag<Tuple<int, int>>();
averages
    .AsParallel()
    .ForAll(x =>  result.Add(
        new Tuple<int, int>(
            x.Select(y => y.length).Sum(), 
            x.Select(y => y.count).Sum())
     ));

Testing with large dataset it seems OP's code is faster

enter image description here

using System;
using System.Text;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.ComponentModel.DataAnnotations;
using System.Runtime.CompilerServices;
using System.Collections.Concurrent;

List<List<(int length, int count)>> averages = new();
for (int i = 0; i < 100000; i++)
{
    var list = new List<(int, int)> { (0, 0), (1, 10), (2, 20), (3, 30), (4, 40) };
    averages.Add(list);
}

var stage = DateTime.Now;
//OP's code
List<(int length, int count)> result = new();
int column = 0;

while (column < averages[0].Count) // averages is list of lists
{
    (int length, int count) cumulative = (0, 0);

    for (int i = 0; i < averages.Count; i++)
    {
        var elm = averages[i].ElementAt(column);
        cumulative.length += elm.length;
        cumulative.count += elm.count;
    }
    result.Add(cumulative);
    column++;
}
Console.WriteLine("OP's code: " + (DateTime.Now - stage).TotalMilliseconds);
stage = DateTime.Now;

//LINQ rewrite
List<Tuple<int, int>> result2 = averages
                                .Select(x =>
                                    new Tuple<int, int>(
                                        x.Select(y => y.length).Sum(),
                                        x.Select(y => y.count).Sum())
                                 ).ToList();
Console.WriteLine("LINQ rewrite: " + (DateTime.Now - stage).TotalMilliseconds);

stage = DateTime.Now;

//LINQ+PP
ConcurrentBag<Tuple<int, int>> result1 = new ConcurrentBag<Tuple<int, int>>();
averages
    .AsParallel()
    .ForAll(x => result1.Add(
        new Tuple<int, int>(
            x.Select(y => y.length).Sum(),
            x.Select(y => y.count).Sum())
        ));
Console.WriteLine("LINQ & PP: " + (DateTime.Now - stage).TotalMilliseconds);
stage = DateTime.Now;
Nitin Sawant
  • 7,278
  • 9
  • 52
  • 98
  • 1
    For 10 lists of 10 elements I doubt parallel programming will be faster. However it'd be something to try with much more data – IEatBagels Feb 27 '23 at 16:50
  • If used correctly I think combination of aggregate LINQ and parallel programming might be faster than normal loops https://stackoverflow.com/a/74116089/223752 – Nitin Sawant Feb 27 '23 at 17:22
  • Curious to compare this with my answer, which only has to iterate through each list once (this answer does it twice), but at the cost of composing an entire new tuple for each item as it accumulates. – Joel Coehoorn Feb 27 '23 at 18:43
  • @NitinSawant I agree, with more data. Parallel programming has a pretty big overhead. For such a small amount of data, I wouldn't be surprises to see this overhead is bigger than the performance gain – IEatBagels Feb 27 '23 at 18:58
  • comment OP and @Joel Coehoorn Thank you and all other who answered and helped me. It is pity I can only select one answer though – Humble Newbie Feb 27 '23 at 19:06
1

The shortest and readable version with Linq:

var result = averages
    .SelectMany(item => item.ToDictionary(keyValuePair => keyValuePair.Item1, keyValuePair => keyValuePair.Item2))
    .GroupBy(kvp => kvp.Key, (key, kvps) => (key, kvps.Sum(kvp => kvp.Value)))
    .ToList();

Here we've converted the list to a dictionary, ran a usual GroupBy with Sum which made the desired result, and converted back to List to get back your desired type of List<(int, int)>.

Below you can find a fully working sample:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Delist
{
    public class Program
    {
        public static void Main()
        {
            List<List<(int, int)>> averages = new();
            for (int i = 0; i < 10; i++) {
                var list = new List<(int,int)>{(0,0),(1,10),(2,20),(3,30),(4,40)};
                averages.Add(list); 
            }
            
            var result = averages
                .SelectMany(item => item.ToDictionary(keyValuePair => keyValuePair.Item1, keyValuePair => keyValuePair.Item2))
                .GroupBy(kvp => kvp.Key, (key, kvps) => (key, kvps.Sum(kvp => kvp.Value)))
                .ToList();
            
            Console.WriteLine(result[2]);
        }
    }
}
Just Shadow
  • 10,860
  • 6
  • 57
  • 75
1

LINQ seems not to have an advantage here. It is not straight forward to get a column-wise total with LINQ and it does not make it faster (it actually makes it slower).

You are using a while loop and a for loop to generate indexes. Both types of loops are legitimate; however, either use two while-loops or two for-loop. There is no apparent reason to switch from one type to the other.

Lists are usually used when a variable size is required. But here, you have a fixed size. Since the nested lists represent a matrix, why not use a 2D array?

I would declare the averages like this:

const int Rows = 10, Columns = 10;

var averages = new (int length, int count)[Rows, Columns];

If you always have a square matrix, you can also replace the two constants by const int N = 10;.

Here is my solution. It uses more speaking variable names.

var columnTotals = new (int length, int count)[Columns];
for (int column = 0; column < Columns; column++) {
    int totalLength = 0, totalCount = 0;
    for (int row = 0; row < Rows; row++) {
        var (length, count) = averages[row, column];
        totalLength += length;
        totalCount += count;
    }
    columnTotals[column] = (totalLength, totalCount);
}

The first code line in the inner loop deconstructs a averages value into two local variables. This indexes the array only once and simplifies the next statements.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • Thank you. I have a question though, you said not to use two different types of loop - is this solely for code clarity or there are some additional nuances? – Humble Newbie Mar 23 '23 at 23:09