10

Maybe a basic question but let us say I have a string that is 2000 characters long, I need to split this string into max 512 character chunks each.

Is there a nice way, like a loop or so for doing this?

Abel
  • 56,041
  • 24
  • 146
  • 247
janhartmann
  • 14,713
  • 15
  • 82
  • 138
  • Are you sure you need 512 **char** chunks? Because that is different from 512 **bytes** which is a more common constraint. – H H Oct 27 '09 at 16:44
  • 1
    @Henk: On the other hand, splitting *text* into chunks based on *bytes* would be pretty odd - the results would depend on the encoding. – Jon Skeet Oct 27 '09 at 16:46
  • Jon, yes, a common problem when re-assembling the text again. But some I/O channels operate in 512 byte blocks. – H H Oct 27 '09 at 16:49
  • @Jon and @Henk: the `string` in C# is defined to contain UTF-16 characters internally, encoding is not relevant in memory, once you write it to disk (or elsewhere), encoding becomes relevant and influences the stored byte size. – Abel Oct 27 '09 at 17:22
  • Abel, I know and so does Jon. I was asking meep to confirm at what level the condition applies. 512 is a much rounder number for bytes than for chars. – H H Oct 27 '09 at 17:38
  • ah, sorry, of course (I reacted on "would depend on encoding", I see now what you meant). – Abel Oct 27 '09 at 17:49
  • possible duplicate of [Splitting a string into chunks of a certain size](http://stackoverflow.com/questions/1450774/splitting-a-string-into-chunks-of-a-certain-size) – Chris Mar 03 '15 at 16:10

8 Answers8

21

Something like this:

private IList<string> SplitIntoChunks(string text, int chunkSize)
{
    List<string> chunks = new List<string>();
    int offset = 0;
    while (offset < text.Length)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        chunks.Add(text.Substring(offset, size));
        offset += size;
    }
    return chunks;
}

Or just to iterate over:

private IEnumerable<string> SplitIntoChunks(string text, int chunkSize)
{
    int offset = 0;
    while (offset < text.Length)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        yield return text.Substring(offset, size);
        offset += size;
    }
}

Note that this splits into chunks of UTF-16 code units, which isn't quite the same as splitting into chunks of Unicode code points, which in turn may not be the same as splitting into chunks of glyphs.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • This algorithm (and its _compatibility_ with Unicode) has been discussed also on Code Review: [Split a string into chunks of the same length](http://codereview.stackexchange.com/a/111925/13424). – Adriano Repetti Nov 27 '15 at 08:52
  • @AdrianoRepetti: Thanks - I've added a small note on the answer as well. – Jon Skeet Nov 27 '15 at 08:54
3

using Jon's implementation and the yield keyword.

IEnumerable<string> Chunks(string text, int chunkSize)
{
    for (int offset = 0; offset < text.Length; offset += chunkSize)
    {
        int size = Math.Min(chunkSize, text.Length - offset);
        yield return text.Substring(offset, size);
    }
}
Community
  • 1
  • 1
Stan R.
  • 15,757
  • 4
  • 50
  • 58
3

Though this question meanwhile has an accepted answer, here's a short version with the help of regular expressions. Purists may not like it (understandably) but when you need a quick solution and you are handy with regexes, this can be it. Performance is rather good, surprisingly:

string [] split = Regex.Split(yourString, @"(?<=\G.{512})");

What it does? Negative look-backward and remembering the last position with \G. It will also catch the last bit, even if it isn't dividable by 512.

Abel
  • 56,041
  • 24
  • 146
  • 247
1
static IEnumerable<string> Split(string str, int chunkSize)    
{   
    int len = str.Length;
    return Enumerable.Range(0, len / chunkSize).Select(i => str.Substring(i * chunkSize, chunkSize));    
}

source: Splitting a string into chunks of a certain size

Community
  • 1
  • 1
Chris Ballance
  • 33,810
  • 26
  • 104
  • 151
1

I will dare to provide a more LINQified version of Jon's solution, based on the fact that the string type implements IEnumerable<char>:

private IList<string> SplitIntoChunks(string text, int chunkSize)
{
    var chunks = new List<string>();
    int offset = 0;
    while(offset < text.Length) {
        chunks.Add(new string(text.Skip(offset).Take(chunkSize).ToArray()));
        offset += chunkSize;
    }
    return chunks;
}
Konamiman
  • 49,681
  • 17
  • 108
  • 138
  • 1
    I did consider that - particularly as MoreLINQ provides a nice Partition method for this sort of thing. However, the efficiency of this would be absolutely horrible :( – Jon Skeet Oct 27 '09 at 16:48
  • btw String, does not have an extension method for "Skip" you would have to do ToCharArray first. – Stan R. Oct 27 '09 at 16:55
  • and I know it implements *IEnumerable* which makes it that much more baffling... – Stan R. Oct 27 '09 at 16:58
  • @Stan: the C# VS team hard-coded the string dropdown helpers: it is an exceptional case where you do not see the Framework-provided extensions methods. They found that clearer. The VB team decided contrary: here you do see the `IEnumerable` extension methods in the dropdown helper. – Abel Oct 27 '09 at 17:53
  • ahh guys that explains it, thanks...and i was baffled by this for a few minutes hahaa – Stan R. Oct 27 '09 at 18:56
1

Most of the answer may have the same flaw. Given an empty text they will yield nothing. We (I) expect at least to get back that empty string (same behaviour as a split on a char not in the string, which will give back one item : that given string)

so we should loop at least once all times (based on Jon's code) :

IEnumerable<string> SplitIntoChunks (string text, int chunkSize)
{
    int offset = 0;
    do
    {
        int size = Math.Min (chunkSize, text.Length - offset);
        yield return text.Substring (offset, size);
        offset += size;
    } while (offset < text.Length);
}

or using a for (Edited : after toying a little more with this, I found a better way to handle the case chunkSize greater than text) :

IEnumerable<string> SplitIntoChunks (string text, int chunkSize)
{
    if (text.Length <= chunkSize)
        yield return text;
    else
    {
        var chunkCount = text.Length / chunkSize;
        var remainingSize = text.Length % chunkSize;

        for (var offset = 0; offset < chunkCount; ++offset)
            yield return text.Substring (offset * chunkSize, chunkSize);

        // yield remaining text if any
        if (remainingSize != 0)
            yield return text.Substring (chunkCount * chunkSize, remainingSize);
    }
}

That could also be used with the do/while loop ;)

Sehnsucht
  • 5,019
  • 17
  • 27
0

Generic extension method:

using System;
using System.Collections.Generic;
using System.Linq;

public static class IEnumerableExtensions
{
  public static IEnumerable<IEnumerable<T>> SplitToChunks<T> (this IEnumerable<T> coll, int chunkSize)
  {
    int skipCount = 0;
    while (coll.Skip (skipCount).Take (chunkSize) is IEnumerable<T> part && part.Any ())
    {
      skipCount += chunkSize;
      yield return part;
    }
  }
}

class Program
{
  static void Main (string[] args)
  {
    var col = Enumerable.Range(1,1<<10);
    var chunks = col.SplitToChunks(8);

    foreach (var c in chunks.Take (200))
    {
      Console.WriteLine (string.Join (" ", c.Select (n => n.ToString ("X4"))));
    }

    Console.WriteLine ();
    Console.WriteLine ();

    "Split this text into parts that are fifteen characters in length, surrounding each part with single quotes and output each into the console on seperate lines."
      .SplitToChunks (15)
      .Select(p => $"'{string.Concat(p)}'")
      .ToList ()
      .ForEach (p => Console.WriteLine (p));

    Console.ReadLine ();
  }
}
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
-1

Something like?

Calculate eachLength = StringLength / WantedCharLength
Then for (int i = 0; i < StringLength; i += eachLength)
SubString (i, eachLength);
Foxfire
  • 5,675
  • 21
  • 29