177

I've never stumbled across this before, but I have now and am surprised that I can't find a really easy way to convert an IEnumerable<char> to a string.

The best way I can think of is string str = new string(myEnumerable.ToArray());, but, to me, it seems like this would create a new char[], and then create a new string from that, which seems expensive.

I would've thought this would be common functionality built into the .NET framework somewhere. Is there a simpler way to do this?

For those interested, the reason I'd like to use this is to use LINQ to filter strings:

string allowedString = new string(inputString.Where(c => allowedChars.Contains(c)).ToArray());
vela
  • 147
  • 10
Connell
  • 13,925
  • 11
  • 59
  • 92
  • Strange, i've asked myself the same thing a few minutes ago: http://stackoverflow.com/questions/11653119/checking-for-and-removing-any-characters-in-a-string/11653272#11653272 – Tim Schmelter Jul 25 '12 at 16:22
  • How weird! I did do a search for a similar question and was surprised I couldn't find any. I could indeed use that solution too though! – Connell Jul 25 '12 at 16:26
  • Yes, that might be more efficient. But you have a white- instead of a black-list. So you need `inputString.Intersect(allowedChars)` instead. – Tim Schmelter Jul 25 '12 at 16:55
  • 1
    Just out of curiosity is allowedChars a `HashSet`? I have learned [first hand](http://stackoverflow.com/questions/5261858/how-to-replace-characters-in-a-array-quickly) how it can give you a performance boost. It cut the time down from 34 seconds to process a file to 4. – Scott Chamberlain Jul 03 '13 at 22:42
  • 1
    @Scott nope, it was a compile-time constant `string`. Wow, that's one hell of a performance boost though. I'll remember to try that out next time ;) – Connell Jul 10 '13 at 13:57

6 Answers6

183

You can use String.Concat().

var allowedString = String.Concat(
    inputString.Where(c => allowedChars.Contains(c))
);

Caveat: This approach will have some performance implications. String.Concat doesn't special case collections of characters so it performs as if every character was converted to a string then concatenated as mentioned in the documentation (and it actually does). Sure this gives you a builtin way to accomplish this task, but it could be done better.

I don't think there are any implementations within the framework that will special case char so you'll have to implement it. A simple loop appending characters to a string builder is simple enough to create.


Here's some benchmarks I took on a dev machine and it looks about right.

1000000 iterations on a 300 character sequence on a 32-bit release build:

ToArrayString:        00:00:03.1695463
Concat:               00:00:07.2518054
StringBuilderChars:   00:00:03.1335455
StringBuilderStrings: 00:00:06.4618266
static readonly IEnumerable<char> seq = Enumerable.Repeat('a', 300);

static string ToArrayString(IEnumerable<char> charSequence)
{
    return new String(charSequence.ToArray());
}

static string Concat(IEnumerable<char> charSequence)
{
    return String.Concat(charSequence);
}

static string StringBuilderChars(IEnumerable<char> charSequence)
{
    var sb = new StringBuilder();
    foreach (var c in charSequence)
    {
        sb.Append(c);
    }
    return sb.ToString();
}

static string StringBuilderStrings(IEnumerable<char> charSequence)
{
    var sb = new StringBuilder();
    foreach (var c in charSequence)
    {
        sb.Append(c.ToString());
    }
    return sb.ToString();
}
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • 7
    ... which probably uses a StringBuilder internally, which in turn uses a dynamically growing char[] internally, from which the final `string` is created. Doesn't seem like much difference to `new string(.ToArray())`. – dtb Jul 25 '12 at 16:21
  • It does indeed use a stringbuilder internally. – Chris Jul 25 '12 at 16:25
  • 1
    Seeing as how a string is a fixed array of characters, you can't avoid condensing an enumerable down into one in order to construct it. Either that happens in your own code, or somewhere inside the framework. – MikeP Jul 25 '12 at 16:30
  • 1
    The difference is that a `string` needs to be immutable, so when it accepts a `char[]` from an outside source it needs to copy it so that changes won't be reflected in the new string. If the `char[]` is constructed internally (i.e. from an `IEnumerable` passed in) then no copy needs to be made. Passing the `IEnumerable` doesn't prevent the conversion to the array, it prevents the copy of that array. – Servy Jul 25 '12 at 16:36
  • 2
    @Servy I understand what you say. But if it is true that it uses a `StringBuilder`, and _if_ it uses `sb.ToString()` eventually on that `StringBuilder` instance, then `sb.ToString()` might also copy the data. Because in general a `StringBuilder` can live on (and be mutated) after `.ToString()` has been called on it. But I agree they could have made tricks that prevented the final copy, for example if `StringBuilder` had a non-public method `ToStringWithoutCopy`. – Jeppe Stig Nielsen Feb 08 '13 at 19:04
  • 1
    Can you post the performance results of using `.Aggregate()`? For example, `new char[] {}.Aggregate("", (s, c) => s+c)`. Also `new char[] {}.Aggregate(new StringBuilder(), (sb, c) => sb.Append(c)).ToString()`. – Pluto Oct 17 '14 at 19:22
  • @Pluto I don't have access to a computer to check, but it should be proportional to the equivalent loop. Just slower overall. Linq gives you a self-contained solution at the cost of performance. – Jeff Mercado Oct 18 '14 at 00:27
  • 1
    Another example of how there is too much worry about performance because it is unlikely your actual production code will build a string from a char array a million times. The actual impact of performing one concatenation is 1 millionth of 7 seconds, which is more than acceptable. I would rather clean readability vs worrying about the performance impact of creating a string on a modern computer. – ChrisCW Feb 06 '15 at 16:45
  • The link to the reference source is broken. – Logerfo Jun 21 '17 at 15:19
  • See my update, as of .Net Core 2.1, `Concat` is best all round. – Jodrell May 31 '18 at 12:19
95

Edited for the release of .Net Core 2.1

Repeating the test for the release of .Net Core 2.1, I get results like this

1000000 iterations of "Concat" took 842ms.

1000000 iterations of "new String" took 1009ms.

1000000 iterations of "sb" took 902ms.

In short, if you are using .Net Core 2.1 or later, Concat is king.

See MS blog post for more details.


I've made this the subject of another question but more and more, that is becoming a direct answer to this question.

I've done some performance testing of 3 simple methods of converting an IEnumerable<char> to a string, those methods are

new string

return new string(charSequence.ToArray());

Concat

return string.Concat(charSequence)

StringBuilder

var sb = new StringBuilder();
foreach (var c in charSequence)
{
    sb.Append(c);
}

return sb.ToString();

In my testing, that is detailed in the linked question, for 1000000 iterations of "Some reasonably small test data" I get results like this,

1000000 iterations of "Concat" took 1597ms.

1000000 iterations of "new string" took 869ms.

1000000 iterations of "StringBuilder" took 748ms.

This suggests to me that there is not good reason to use string.Concat for this task. If you want simplicity use the new string approach and if want performance use the StringBuilder.

I would caveat my assertion, in practice all these methods work fine, and this could all be over optimization.

Community
  • 1
  • 1
Jodrell
  • 34,946
  • 5
  • 87
  • 124
  • Following https://github.com/dotnet/coreclr/pull/14298 I suspect this might need revisiting – Jodrell May 31 '18 at 11:41
  • 1
    Credit to Stephen Toub for the improvement https://social.msdn.microsoft.com/profile/Stephen+Toub+-+MSFT @user:479403 – Jodrell May 31 '18 at 12:31
26

As of .NET 4, many string methods take IEnumerable as arguments.

string.Concat(myEnumerable);
MikeP
  • 7,829
  • 33
  • 34
11

Another possibility is using

string.Join("", myEnumerable);

I did not measure the performance.

nitrogenycs
  • 962
  • 9
  • 13
11

Here is a more succinct version of the StringBuilder answer:

return charSequence.Aggregate(new StringBuilder(), (seed, c) => seed.Append(c)).ToString();

I timed this using the same tests that Jeff Mercado used and this was 1 second slower across 1,000,000 iterations on the same 300 character sequence (32-bit release build) than the more explicit:

static string StringBuilderChars(IEnumerable<char> charSequence)
{
    var sb = new StringBuilder();
    foreach (var c in charSequence)
    {
        sb.Append(c);
    }
    return sb.ToString();
}

So if you're a fan of accumulators then here you go.

Adam Smith
  • 659
  • 6
  • 15
10

My data is contrary to the results Jodrell posted. First have a look at the extension methods I use:

public static string AsStringConcat(this IEnumerable<char> characters)
{        
    return String.Concat(characters);
}

public static string AsStringNew(this IEnumerable<char> characters)
{
    return new String(characters.ToArray());
}

public static string AsStringSb(this IEnumerable<char> characters)
{
    StringBuilder sb = new StringBuilder();
    foreach (char c in characters)
    {
        sb.Append(c);
    }
    return sb.ToString();
}

My results

With

  • STRLEN = 31
  • ITERATIONS = 1000000

Input

  • ((IEnumerable<char>)RandomString(STRLEN)).Reverse()

Results

  • Concat: 1x
  • New: 3x
  • StringBuilder: 3x

Input

  • ((IEnumerable<char>)RandomString(STRLEN)).Take((int)ITERATIONS/2)

Results

  • Concat: 1x
  • New: 7x
  • StringBuilder: 7x

Input

  • ((IEnumerable<char>)RandomString(STRLEN)) (this is just an upcast)

Results

  • Concat: 0 ms
  • New: 2000 ms
  • StringBuilder: 2000 ms
  • Downcast: 0 ms

I ran this on an Intel i5 760 targeting .NET Framework 3.5.

hBGl
  • 274
  • 2
  • 10
  • 1
    For what its worth, my tests targeted .Net 4.0 and were run against a release build, from the command line, without a debugger attached. Try your tests with a purer sequence, rather than a cast. Something like `Enumerable.Range(65, 26).Select(i => (char)i);`, this should avoid the chance for an optimized shortcut. – Jodrell Oct 28 '13 at 09:39