1

I've got List<string> Names;, it has 700 000 names in it. How can I join every 500 strings (using separator ",") and add them to new List<string> ABC;

So I want to have one List<string>, that will hold 1400 joined strings.

ABC[0]= first 500 names, ABC[1]= next 500 names and so on.

Little Fox
  • 1,212
  • 13
  • 39

5 Answers5

8

Here is how you can do it with LINQ:

var result =
    Names
        .Select((item, index) => new {Item = item, Index = index})
        .GroupBy(x => x.Index / 500)
        .Select(g => string.Join(",", g.Select(x => x.Item)))
        .ToList();

First, for each item, you select the item it self along with its index. Then you group these items by index / 500 which means that each 500 items will be grouped together.

Then you use string.Join to join the 500 strings in each group together.

Yacoub Massad
  • 27,509
  • 2
  • 36
  • 62
  • 2
    I would personally avoid this approach for large lists, since it makes an additional copy of all the strings. – Matthew Watson Jan 28 '16 at 12:38
  • @MatthewWatson, I am not really sure about this. LINQ should be efficient. To be sure, in cases of large data, I always measure the time and memory consumption of such queries, and if there are issues I look for other approaches. – Yacoub Massad Jan 28 '16 at 12:41
  • 1
    I just meant in terms of memory. It makes a copy of the entire list into dictionary before returning the result. I guess it will be fast though - since GroupBy() is O(N). (I did some timings and this is roughly 4 times slower than using a dedicated batcher.) – Matthew Watson Jan 28 '16 at 12:46
  • @MatthewWatson, you mean because of `GroupBy`, right? I think you probably have a point. OP should measure memory consumption and see if it is acceptable for them. Most of the times this is not a problem. – Yacoub Massad Jan 28 '16 at 12:49
  • 1
    It's the 700,000 strings that was worrying me. :) – Matthew Watson Jan 28 '16 at 12:50
7

With MoreLINQ Batch (or any other batch implementation):

var abc = names.Batch(500).Select(x => String.Join(",", x)).ToList();

NOTE: Grouping operator is not streaming operator (as well as ToList). That means that all 700k strings should be enumerated and keys should be calculated for each item, and each items should be stored in internal groups. And that will cost some time and resources. Batching is streaming and it does not store all items internally. It stores only current batch. So with batching if you will not convert results to list, you can process batches one by one faster and save some memory.

Community
  • 1
  • 1
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • 1
    Damn, I just wanted to answer with a loop attempt, but that LINQ is just too elegant to pass up here. That's a beautiful one liner. – Sossenbinder Jan 28 '16 at 12:25
  • "It stores only current batch." Yep, but note that the code I posted doesn't even store the current batch ;) (Although it's not threadsafe). It's marginally faster than MoreLINQ's implementation, by about 15% – Matthew Watson Jan 28 '16 at 12:52
  • @MatthewWatson yeah, with `ToList()` at the end enumerating will work.. but if one of items will not be enumerated fully.. :) Though in this particular case it will work - seems like OP wants all items anyway – Sergey Berezovskiy Jan 28 '16 at 12:55
  • Good point about fully enumerating it. – Matthew Watson Jan 28 '16 at 12:59
4

If you don't want to use a separate library, you can use a simple extension method to partition a sequence into subsequences of a given size:

public static class EnumerableExt
{
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> input, int blockSize)
    {
        var enumerator = input.GetEnumerator();

        while (enumerator.MoveNext())
            yield return nextPartition(enumerator, blockSize);
    }

    static IEnumerable<T> nextPartition<T>(IEnumerator<T> enumerator, int blockSize)
    {
        do    yield return enumerator.Current;
        while (--blockSize > 0 && enumerator.MoveNext());
    }
}

Then you can use it like so:

        // Create some sample strings.
        var strings = Enumerable.Range(1, 10000).Select(x => x.ToString()).ToList();

        var result = strings.Partition(500).Select(block => string.Join(",", block)).ToList(); 

This approach does not make a copy of the input array.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
0

The shortest way is to use LINQ Chunks implementation from SO answer:

List<string> ABC = Names.Select((x, i) => new { x, i })
                        .GroupBy(xi => xi.i / 500, xi => xi.x)
                        .Select(g => string.Join(",", g))
                        .ToList();
Community
  • 1
  • 1
Vadim Martynov
  • 8,602
  • 5
  • 31
  • 43
0

something like this:

public static void Main()
    {
        string[] strs = new string[]{"aaaa","bbb","ccc","ddd","eeee","fff","ggg","hhhh","iiiii","JJJJ"};
        List<string> res=new List<string>();
        for(int i=0;i<strs.Length;i+=5){
            res.Add(string.Join(",",strs,i,5));
        }
        res.ForEach(F => Console.WriteLine(F));
    }

just change the iteration to be with 500 instead of 5, and strs to be your array.

Or Yaniv
  • 571
  • 4
  • 11