2

I have a string array of about 20,000,000 values. And i need to convert it to a string

I've tried:

    string data = "";
    foreach (var i in tm)
    {
        data = data + i;
    }

But that takes too long time

does someone know a faster way?

Oguz Ozgul
  • 6,809
  • 1
  • 14
  • 26
  • 1
    Why do you need one single `string` that large? – juharr Dec 01 '15 at 19:58
  • I'm working on making a zipper for school – Serhat Reis Dec 01 '15 at 19:59
  • Why do you need a string that large, what are you trying to accomplish? – David Pine Dec 01 '15 at 20:01
  • You might want to look at this [related post](http://stackoverflow.com/questions/21078/whats-the-best-string-concatenation-method-using-c). – Axel Kemper Dec 01 '15 at 20:03
  • 3
    Performance is bad because `data = data + i` is a [Shlemiel the painter's algorithm](http://www.joelonsoftware.com/articles/fog0000000319.html) – Filburt Dec 01 '15 at 20:04
  • 1
    I'm not sure what all you need to do for your zipper, but there must be a way to accomplish it without creating a gigantic string. – juharr Dec 01 '15 at 20:05
  • 3
    for such a large concatenation you need to look at other options, like streaming to disk for instance – KiwiPiet Dec 01 '15 at 20:07
  • Just tell us if this is for real.. If it is, tell us what is the source of these millions of strings. Why loading them into memory at the first place? Use the source to compress, if that is what you mean by a zipper – Oguz Ozgul Dec 01 '15 at 20:23

5 Answers5

3

Try StringBuilder:

StringBuilder sb = new StringBuilder();
foreach (var i in tm)
{
    sb.Append(i);
}

To get the resulting String use ToString():

string result = sb.ToString();
Hatted Rooster
  • 35,759
  • 6
  • 62
  • 122
  • Thanks but i get the following error: An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll – Serhat Reis Dec 01 '15 at 19:57
  • @SerhatReis Be aware that even in an x64 environment [there is 2 GB limit](http://stackoverflow.com/a/6107351/205233). – Filburt Dec 01 '15 at 20:00
  • @Filburt the very next (and much more recent) answer on the link you provided states that the 2GB limit doesn't apply to x64 as of .NET 4.5... – DrewJordan Dec 01 '15 at 20:03
  • @DrewJordan The limit still applies but can be disabled. – Filburt Dec 01 '15 at 20:20
  • @DrewJordan The 2GB limit is for objects. In the latest x64 frameworks it is relaxed, but strings can still only hold at most `int.MaxValue` characters. Strings use `int` internally for lengths so they can't ever exceed 2^31-1 characters in length. – Corey Dec 01 '15 at 23:17
1

The answer is going to depend on the size of the output string and the amount of memory you have available and usable. The hard limit on string length appears to be 2^31-1 (int.MaxValue) characters, occupying just over 4GB of memory. Whether you can actually allocate that is dependent on your framework version, etc. If you're going to be producing a larger output then you can't put it into a single string anyway.

You've already discovered that naive concatenation is going to be tragically slow. The problem is that every pass through the loop creates a new string, then immediately discards it on the next iteration. This is going to fill up memory pretty quickly, forcing the Garbage Collector to work overtime finding old strings to clear out of memory, not to mention the amount of memory fragmentation and all that stuff that modern programmers don't pay much attention to.

A StringBuiler, is a reasonable solution. Internally it allocates blocks of characters that it then stitches together at the end using pointers and memory copies. Saves a lot of hassles that way and is quite speedy.

As for String.Join... it uses a StringBuilder. So does String.Concat although it is certainly quicker when not inserting separator characters.

For simplicity I would use String.Concat and be done with it.

But then I'm not much for simplicity.

Here's an untested and possibly horribly slow answer using LINQ. When I get time I'll test it and see how it performs, but for now:

string result = new String(lines.SelectMany(l => (IEnumerable<char>)l).ToArray());

Obviously there is a potential overflow here since the ToArray call can potentially create an array larger than the String constructor can handle. Try it out and see if it's as quick as String.Concat.

Corey
  • 15,524
  • 2
  • 35
  • 68
0

So you can do it in LINQ, like such.

string data = tm.Aggregate("", (current, i) => current + i);

Or you can use the string.Join function

string data = string.Join("", tm);
David Haxton
  • 283
  • 1
  • 10
  • The `Aggregate` you have is basically the same thing and will have the same horrible performance. The `Join` is OK, but `string.Concat` would make more sense. – juharr Dec 01 '15 at 20:07
  • I am really curious about the metrics of each solution. I don't expect much from linq myself this way, and probably will calculate the metrics for each to settle my curiosity. – Oguz Ozgul Dec 01 '15 at 20:26
0

Cant check it right now but I'm curious on how this option would perform:

var data = String.Join(string.Empty, tm);

Is Join optimized and ignores concatenation a with String.Empty?

InBetween
  • 32,319
  • 3
  • 50
  • 90
  • `Join` doesn't appear to make any special allowances for empty separators. They have a `Concat` method for that case, so I guess they didn't think it was necessary to test for it in `Join`. – Corey Dec 01 '15 at 21:21
0

For this big data unfortunately memory based methods will fail and this will be a real headache for GC. For this operation create a file and put every string in it. Like this:

using (StreamWriter sw = new StreamWriter("some_file_to_write.txt")){
    for (int i=0; i<tm.Length;i++)
        sw.Write(tm[i]);
}

Try to avoid using "var" on this performance demanding approach. Correction: "var" does not effect perfomance. "dynamic" does.

Mert Gülsoy
  • 2,779
  • 1
  • 19
  • 23
  • Why not use `var`? It is resolved at compile time so should have zero impact on performance. – Corey Dec 01 '15 at 21:20
  • @Corey Thanks for asking. It was my mistake to mix `var` with `dynamic`. I've corrected it. – Mert Gülsoy Dec 02 '15 at 08:14
  • All good. I figured it was either a mistake or something I wasn't aware of. Was actually hoping it was the latter :) – Corey Dec 02 '15 at 11:55