19

I have the following structure

List<string[]> sList = new List<string[]>() {
    new[] { "x", "xxx", "xxxx" },  //1,3,4
    new[] { "x", "xx", "xx" },     //1,2,2
    new[] { "xxxxxx", "xx", "xx" } //6,2,2
};

and I need to determine the maximum string.length of the items by column

In this example the expected result should be:

List<int> Result = new List<int>() { 6, 3, 4 };

Is there a easy Linq-approach?

my effort (working but not using Linq):

List<int> Result = new List<int>();
foreach (string[] line in Table)
{
    for (int i = 0; i < line.Length; i++)
    {
        if (Result.Count > i)
        {
            if (Result[i] < line[i].Length)
            {
                Result[i] = line[i].Length;
            }
        }
        else
        {
            Result.Insert(i, line[i].Length);
        }
    }
}

the number of rows/columns is dynamic but each row has the same number of columns.

Mike G
  • 4,232
  • 9
  • 40
  • 66
Byyo
  • 2,163
  • 4
  • 21
  • 35

7 Answers7

21

This is the best that I could come up with:

List<int> Result = sList.First().Select((x, i) => sList.Max(y => y[i].Length)).ToList();

Bonus points for one line?

Explanation: Since you said that they all have the same number of elements, take the first row and loop through that. Use the index to get that element from each of the other rows getting the length and then the maximum of that.

TomDoesCode
  • 3,580
  • 2
  • 18
  • 34
  • 2
    thanks, great solution i wish i could accept both - but tim's is more fault-tolerant – Byyo Sep 23 '15 at 05:47
20

One approach:

int maxColumns = sList.Max(arr => arr.Length);
List<int> Result = Enumerable.Range(0, maxColumns)
    .Select(i => sList.Max(arr => (arr.ElementAtOrDefault(i) ?? "").Length))
    .ToList();

You want the max-length per column. The columns are the array indices. So you need a way to look at the arrays per index. Therefore i've used Enumerable.Range(0, maxColumns). Then i'm using ElementAtOrDefault to handle the case that the array doesn't contain so many "columns"(not needed as i'm explaining below). That returns null for reference types like string. I replace them with the null-coalescing operator with "" which yields 0 as length.

Since you've mentioned that "each row has the same number of columns" you can make it a little bit more readable:

List<int> Result = Enumerable.Range(0, sList.First().Length)
    .Select(i => sList.Max(arr => arr[i].Length))
    .ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
6

Here's an approach using Aggregate:

var maxLengths = sList.Select(strings => strings.Select(s => s.Length))
                      .Aggregate((prev, current) => prev.Zip(current, Math.Max));

This works with the assumption that each collection has the same number of items.

How It works:

  • First we take only the lengths of the string, so conceptually we are working with IEnumerable<IEnumerable<int>>, so our collection looks like this:

    {{1, 3, 4},
     {1, 2, 2},
     {6, 2, 2}}
    

    (this is not accurate - due to deferred execution we never actually have this collection - the lengths are calculated for each row as needed.)

  • Next, we use Aggregate to combine the vectors:

    • We have {1, 3, 4} and {1, 2, 2}.
    • Zip the two collections: {{1, 2}, {3, 2}, {4, 2}}.
    • Apply Math.Max on each pair, and get {2, 3, 4}.
  • Repeat for {2, 3, 4} and {6, 2, 2} and receive {6, 3, 4}.

Pros:

  • Works for a single row.
  • Only iterate over the collection once.
  • Does not make any assumption on the concrete implementation (List of Arrays) - can work with any IEnumerable<IEnumerable<string>>, without relying on [i] or ElementAtOrDefault.
  • You've asked for a Linq solution.

Cons:

  • Somewhat complicated.
  • If the rows were in different lengths Zip would not behave as expected (return the shortest length rather than the longest).
Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292
4

Assuming that all your "rows" have the same number of items ("columns"):

var rowLength = sList[0].Length;
var result = Enumerable.Range(0, rowLength)
    .Select(index => sList.Select(row => row[index].Length).Max())
    .ToList();
Konamiman
  • 49,681
  • 17
  • 108
  • 138
4

Surprisingly, there aren't any good(efficient) answers

1)Don't use LINQ. Try a simple,readable procedural solution
2)If for some reason you want to use LINQ,at least make it efficient

Below you see the results of my sample code

enter image description here

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ConsoleApplication9
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            Random rand = new Random();
            int maxColumnLength = 10;
            int rowNumber = 100000;
            int colNumber = 300;
            List<string[]> input = new List<string[]>();
            for (int i = 0; i < rowNumber; i++)
            {
                string[] row = new string[colNumber];
                for (int j = 0; j < colNumber; j++)

                    row[j] = new string('x', rand.Next(maxColumnLength));

                input.Add(row);
            }

            var result = Time(SimpleIteration, input, "Simple Iteration");

            var result2 = Time(SimpleIterationWithLinq, input, "Simple Iteration LINQ");

            var result3 = Time(AcceptedAnswer, input, "Accepted Answer");

            var result4 = Time(TomDoesCode, input, "TomDoesCode");

            //var result5 = Time(Kobi, input, "Kobi"); //StackOverflow

            var result6 = Time(Konamiman, input, "Konamiman");

            var result7 = Time(RahulSingh, input, "RahulSingh");

            System.Console.ReadLine();
        }

        private static List<int> SimpleIteration(List<string[]> input)
        {
            int[] maxPerColumn = new int[input.First().Length];
            for (int i = 0; i < maxPerColumn.Length; i++)
                maxPerColumn[i] = -1;

            foreach (var row in input)
            {
                for (int i = 0; i < row.Length; i++)
                    if (maxPerColumn[i] < row[i].Length)
                        maxPerColumn[i] = row[i].Length;
            }

            return maxPerColumn.ToList();
        }

        private static List<int> SimpleIterationWithLinq(List<string[]> input)
        {
            return input.Aggregate(new int[input.First().Length], (maxPerColumn, row) =>
             {
                 for (int i = 0; i < row.Length; i++)
                     if (maxPerColumn[i] < row[i].Length)
                         maxPerColumn[i] = row[i].Length;

                 return maxPerColumn;
             }).ToList();
        }

        private static List<int> AcceptedAnswer(List<string[]> input)
        {
            return Enumerable.Range(0, input.First().Length)
                             .Select(i => input.Max(arr => arr[i].Length))
                             .ToList();
        }

        private static List<int> TomDoesCode(List<string[]> input)
        {
            return input.First().Select((x, i) => input.Max(y => y[i].Length)).ToList();
        }

        private static List<int> Kobi(List<string[]> input)
        {
            return input.Select(strings => strings.Select(s => s.Length))
                      .Aggregate((prev, current) => prev.Zip(current, Math.Max)).ToList();
        }

        private static List<int> Konamiman(List<string[]> input)
        {
            var rowLength = input[0].Length;
            return Enumerable.Range(0, rowLength)
                .Select(index => input.Select(row => row[index].Length).Max())
                .ToList();
        }

        private static List<int> RahulSingh(List<string[]> input)
        {
            int arrayLength = input.First().Length;
            return input.SelectMany(x => x)
                                     .Select((v, i) => new { Value = v, Index = i % arrayLength })
                                     .GroupBy(x => x.Index)
                                     .Select(x => x.Max(z => z.Value.Length))
                                     .ToList();
        }

        private static List<int> Time(Func<List<string[]>, List<int>> act, List<string[]> input, string methodName)
        {
            Stopwatch s = Stopwatch.StartNew();
            var result = act(input);
            s.Stop();
            Console.WriteLine(methodName.PadRight(25) + ":" + s.ElapsedMilliseconds + " ms");
            return result;
        }
    }
}
George Vovos
  • 7,563
  • 2
  • 22
  • 45
2

This is what I have:-

  int arrayLength = sList.First().Length;
  List<int> result = sList.SelectMany(x => x)
                           .Select((v, i) => new { Value = v, Index = i % arrayLength  })
                           .GroupBy(x => x.Index)
                           .Select(x => x.Max(z => z.Value.Length))
                           .ToList();

Explanation:- Flatten your list, project it's value and index (Index is calculated based on columns length), group by index and select the value with maximum length.

Rahul Singh
  • 21,585
  • 6
  • 41
  • 56
  • You could have done sList.SelectMany(list => list .Select((s, index) => new{ Column = index, s.Length })) .GroupBy(g => g.Column) .Select(g => new { Column = g.Key, Length = g.Max(x => x.Length) }) – Phil Apr 06 '16 at 14:10
1

The general method to aggregate results is the Aggregate extension method.

tl;rd for the simplest case:

    slist.Select(row => row.Select(str => str.Length))
         .Aggregate(l, r) => l.Zip(r, Math.Max);

First lets get the data in the form we want: A list of lengths

var llist = slist.Select(row => row.Select(str => str.Length));

Now we have a list of lists of length. Much easier to work with.

Aggregate works by keeping some running total, and running a function of which the parameters are the aggregate and the next row, yielding a new aggregate. You can optionally provide a seed value. It looks like this:

List<Int> seed = ???
Func<IEnumerable<Int>, IEnumerable<Int>, IEnumerable<Int>> accumulator  = ???
llist.Aggregate(seed, accumulator);

From here on, I assume that the number of "columns" is variable.

The seed value is new List<int>(). If there are 0 rows, that's your result.

List<Int> seed = new List<Int>();
Func<List<Int>, List<Int>, List<Int>> accumulator  = ???
llist.Aggregate(seed, accumulator);

For each row, the new aggregate is the max of each element of the aggregate and the next row.

This happens to be exactly what the .Zip method is for if the items are guaranteed to be of the same length: https://msdn.microsoft.com/en-us/library/vstudio/dd267698(v=vs.100).aspx

giving

private IEnumerable<Int> accumulate(aggregate, row){
  aggregate.Zip(row, (i1, i2) => Math.max(i1, i2));
}

unfortunately we don't have that guarantee, so we'll have to provide our own ZipAll. The rules are: If it an item exits in both, take the max. If it only exists in one, take that one. This is a bit of a potatoe; ZipAll should really be a library function.

  IEnumerable<T> ZipAll(IEnumerable<T> left, IENumerable<T> right, Func<T, T, T> selector) {
   var le = left.GetEnumerator;
   var re = right.GetEnumerator;

   while(le.MoveNext()){
     if (re.MoveNext()) {
       yield return selector(le.Current, re.Current);
     } else {
       yield return le.Current;
     }
   }

   while(re.MoveNext()) yield return re.Current;
}

(1)

That gives us

List<Int> seed = new List<Int>();
Func<List<Int>, List<Int>, List<Int>> accumulator = (l, r) => ZipAll(l, r, Math.Max);
llist.Aggregate(seed, accumulator);

If you inline everything(accept our custom ZipAll function) that becomes

llist.Aggregate(new List<Int>(), (l, r) => ZipAll(l, r, Math.Max));

If you have at least one row, you can leave out the seed (the first row becomes the seed) to get

llist.Aggregate((l, r) => ZipAll(l, r, Math.Max));

In case the number of columns is constant, we can use the build in Zip

llist.Aggregate((l, r) => l.Zip(r, Math.Max));


(1)Outside the scope of this question, a more general overload is
IEnumerable<TResult, TLeft, TRight> ZipAll(IEnumerable<TLeft> left, IENumerable<TRight> right, Func<TResult, TLeft, TRight> selector, Func<TResult, TLeft> leftSelector, Func<Tresult, TRight> rightselector) {
   var le = left.GetEnumerator;
   var re = right.GetEnumerator;

   while(le.MoveNext()){
     if (re.MoveNext()) {
       yield return selector(le.Current, re.Current);
     } else {
       yield return leftSelector(le.Current);
     }
   }

   while(re.MoveNext()) yield return rightSelector(re.Current);
}

With this we could write our earlier ZipAll in terms of this one with the left and right projections as identity function as

IEnumerable<T> ZipAll(this IEnumerable<T> left, IENumerable<T> right, Func<T, T, T> selector) {
    return ZipAll(left, right, selector, id => id, id => id);
 }
Martijn
  • 11,964
  • 12
  • 50
  • 96
  • The OP did say "each row has the same number of columns", so I don't think you need `ZipAll` - `Zip` should work. You also have multiple syntax errors like missing semicolons and parentheses. Aside from that, the solution is identical to my answer, you only explained it in another way. – Kobi Sep 24 '15 at 07:34
  • @Kobi thanks for the note, I did a quick check of the syntax errors, I think I got most. I'll double check later. Having different people explain the same thing in different ways is IMO the strength of SO. – Martijn Sep 24 '15 at 13:10