54

Why isn't it possible to use fluent language on string?

For example:

var x = "asdf1234";
var y = new string(x.TakeWhile(char.IsLetter).ToArray());

Isn't there a better way to convert IEnumerable<char> to string?

Here is a test I've made:

class Program
{
  static string input = "asdf1234";
  static void Main()
  {
    Console.WriteLine("1000 times:");
    RunTest(1000, input);
    Console.WriteLine("10000 times:");
    RunTest(10000,input);
    Console.WriteLine("100000 times:");
    RunTest(100000, input);
    Console.WriteLine("100000 times:");
    RunTest(100000, "ffff57467");


    Console.ReadKey();

  }

  static void RunTest( int times, string input)
  {

    Stopwatch sw = new Stopwatch();

    sw.Start();
    for (int i = 0; i < times; i++)
    {
      string output = new string(input.TakeWhile(char.IsLetter).ToArray());
    }
    sw.Stop();
    var first = sw.ElapsedTicks;

    sw.Restart();
    for (int i = 0; i < times; i++)
    {
      string output = Regex.Match(input, @"^[A-Z]+", 
        RegexOptions.IgnoreCase).Value;
    }
    sw.Stop();
    var second = sw.ElapsedTicks;

    var regex = new Regex(@"^[A-Z]+", 
      RegexOptions.IgnoreCase);
    sw.Restart();
    for (int i = 0; i < times; i++)
    {
      var output = regex.Match(input).Value;
    }
    sw.Stop();
    var third = sw.ElapsedTicks;

    double percent = (first + second + third) / 100;
    double p1 = ( first / percent)/  100;
    double p2 = (second / percent )/100;
    double p3 = (third / percent  )/100;


    Console.WriteLine("TakeWhile took {0} ({1:P2}).,", first, p1);
    Console.WriteLine("Regex took {0}, ({1:P2})." , second,p2);
    Console.WriteLine("Preinstantiated Regex took {0}, ({1:P2}).", third,p3);
    Console.WriteLine();
  }
}

Result:

1000 times:
TakeWhile took 11217 (62.32%).,
Regex took 5044, (28.02%).
Preinstantiated Regex took 1741, (9.67%).

10000 times:
TakeWhile took 9210 (14.78%).,
Regex took 32461, (52.10%).
Preinstantiated Regex took 20669, (33.18%).

100000 times:
TakeWhile took 74945 (13.10%).,
Regex took 324520, (56.70%).
Preinstantiated Regex took 172913, (30.21%).

100000 times:
TakeWhile took 74511 (13.77%).,
Regex took 297760, (55.03%).
Preinstantiated Regex took 168911, (31.22%).

Conclusion: I'm doubting what's better to prefer, I think I'm gonna go on the TakeWhile which is the slowest only on first run.

Anyway, my question is if there is any way to optimize the performance by restringing the result of the TakeWhile function.

Morten Kristensen
  • 7,412
  • 4
  • 32
  • 52
Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
  • 2
    Please explain what you mean by "best": Fastest? Least memory-hungry? Easiest to understand? – LukeH Nov 12 '11 at 23:40
  • @LukeH I've already made my decision on what to choose: fastests. My question is if there is a nicer way than `new string(x.TakeWhile(p).ToArray)` – Shimmy Weitzhandler Nov 13 '11 at 00:08
  • 2
    @LukeH: Might want to undelete your solution: It is faster than mine by a very large margin – BrokenGlass Nov 13 '11 at 00:26
  • All of these answers beg the question - why hasn't IEnumerable.ToString() been overridden in System.Linq.Enumerable – Dave Oct 02 '17 at 12:12
  • @Dave, you can't override a base function with an extension method. However, [I would want](https://github.com/dotnet/corefx/issues/24395) to see is an overload in the `string` constructor that takes an `IEnumerable`. – Shimmy Weitzhandler Oct 03 '17 at 13:19

8 Answers8

54

How about this to convert IEnumerable<char> to string:

string.Concat(x.TakeWhile(char.IsLetter));
Kai G
  • 3,371
  • 3
  • 26
  • 30
  • I guess that string.Concat uses a StringBuilder internally. Would be very strange if it didn't. So this solution should also perform really well. – Stefan Paul Noack Apr 16 '13 at 10:17
  • .Net 4.0 only. Even if you write your own .TakeWhile in 3.5 then string.Concat(IEnumerable) doesn't do what you expect. – Dylan Nicholson Dec 06 '13 at 04:04
30

Edited for the release of .Net Core 2.1

Repeating the test for the release of .Net Core 2.1, I get results like this

1000000 iterations of "Concat" took 842ms.

1000000 iterations of "new String" took 1009ms.

1000000 iterations of "sb" took 902ms.

In short, if you are using .Net Core 2.1 or later, Concat is king.


I've made this the subject of another question but more and more, that is becoming a direct answer to this question.

I've done some performance testing of 3 simple methods of converting an IEnumerable<char> to a string, those methods are

new string

return new string(charSequence.ToArray());

Concat

return string.Concat(charSequence)

StringBuilder

var sb = new StringBuilder();
foreach (var c in charSequence)
{
    sb.Append(c);
}

return sb.ToString();

In my testing, that is detailed in the linked question, for 1000000 iterations of "Some reasonably small test data" I get results like this,

1000000 iterations of "Concat" took 1597ms.

1000000 iterations of "new string" took 869ms.

1000000 iterations of "StringBuilder" took 748ms.

This suggests to me that there is not good reason to use string.Concat for this task. If you want simplicity use the new string approach and if want performance use the StringBuilder.

I would caveat my assertion, in practice all these methods work fine, and this could all be over optimization.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
  • 3
    I would want to sacrifice 121 milliseconds to use `new string` in place of writing three additional lines of code to use `StringBuilder`. #cleanCode. – RBT Aug 08 '17 at 12:42
  • 1
    Your `MS Blog Post` link points to your Stack Overflow question instead. – NetMage May 18 '21 at 20:28
15

Assuming that you're looking predominantly for performance, then something like this should be substantially faster than any of your examples:

string x = "asdf1234";
string y = x.LeadingLettersOnly();

// ...

public static class StringExtensions
{
    public static string LeadingLettersOnly(this string source)
    {
        if (source == null)
            throw new ArgumentNullException("source");

        if (source.Length == 0)
            return source;

        char[] buffer = new char[source.Length];
        int bufferIndex = 0;

        for (int sourceIndex = 0; sourceIndex < source.Length; sourceIndex++)
        {
            char c = source[sourceIndex];

            if (!char.IsLetter(c))
                break;

            buffer[bufferIndex++] = c;
        }
        return new string(buffer, 0, bufferIndex);
    }
}
Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Hmmm, just noticed that you only need letters from the beginning of the string, in which case I'd expect [BrokenGlass's answer](http://stackoverflow.com/questions/8108313/best-way-to-convert-ienumerablechar-to-string/8108584#8108584) to be the fastest. (Again, I haven't actually benchmarked to confirm.) – LukeH Nov 13 '11 at 00:20
  • 2
    +1 Pre-allocating the buffer is probably what makes this faster, but this is just a guess - limited testing shows its way faster than using `Substring()` – BrokenGlass Nov 13 '11 at 00:35
13

Why isn't it possible to use fluent language on string?

It is possible. You did it in the question itself:

var y = new string(x.TakeWhile(char.IsLetter).ToArray());

Isn't there a better way to convert IEnumerable<char> to string?

(My assumption is:)

The framework does not have such a constructor because strings are immutable, and you'd have to traverse the enumeration twice in order to pre-allocate the memory for the string. This is not always an option, especially if your input is a stream.

The only solution to this is to push to a backing array or StringBuilder first, and reallocate as the input grows. For something as low-level as a string, this probably should be considered too-hidden a mechanism. It also would push perf problems down into the string class by encouraging people to use a mechanism that cannot be as-fast-as-possible.

These problems are solved easily by requiring the user to use the ToArray extension method.

As others have pointed out, you can achieve what you want (perf and expressive code) if you write support code, and wrap that support code in an extension method to get a clean interface.

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
9

You can very often do better performance-wise. But what does that buy you? Unless this is really the bottle neck for your application and you have measured it to be I would stick to the Linq TakeWhile() version: It is the most readable and maintainable solution, and that is what counts for most of all applications.

If you really are looking for raw performance you could do the conversion manually - the following was around a factor 4+ (depending on input string length) faster than TakeWhile() in my tests - but I wouldn't use it personally unless it was critical:

int j = 0;
for (; j < input.Length; j++)
{
    if (!char.IsLetter(input[j]))
        break;
}
string output = input.Substring(0, j);
BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
  • 3
    +1. And there's nothing wrong with wrapping this up in a helper method of some kind for re-use. Something like `source.LeadingLettersOnly()` would be more readable than `new string(source.TakeWhile(char.IsLetter).ToArray())`, imo. – LukeH Nov 13 '11 at 00:29
  • 1
    @LukeH: Your solution is way faster - please undelete! – BrokenGlass Nov 13 '11 at 00:30
  • The function is supposed to compare a search query to a few thousands (100000) string's first chars, so performance is all that matters. – Shimmy Weitzhandler Nov 13 '11 at 00:33
  • @BrokenGlass: Ok, I've undeleted. I still haven't run any benchmarks but I'm surprised than mine outruns yours. I guess yours needs two loops, the explicit one first then another inside `Substring` somewhere (although I'd assume that `Substring` would use some native code to blit the required data as fast as possible.) – LukeH Nov 13 '11 at 00:38
  • @LukeH: That line is more readable, but the supporting code *is not* more readable. I'd have to write many unit tests for the extension method, while the Linq I would probably just code review. – Merlyn Morgan-Graham Nov 13 '11 at 02:37
  • @Merlyn: That's true, but those unit tests only need to be written *once*. Obviously, if I didn't need the performance then I'd go for the LINQ version every time, but the OP stressed that their main requirement is performance. – LukeH Nov 14 '11 at 00:01
7
return new string(foo.Select(x => x).ToArray());
er-sho
  • 9,581
  • 2
  • 13
  • 26
Vlad Radu
  • 131
  • 1
  • 7
2

I ran some tests in LINQPad 7 (dotnet 6.0.1) w/ BenchmarkDotNet:

Method Mean Error StdDev
StringFromArray 76.35 μs 1.482 μs 1.522 μs
StringConcat 100.93 μs 0.675 μs 0.631 μs
StringBuilder 100.52 μs 0.963 μs 0.901 μs
StringBuilderAggregate 116.80 μs 1.714 μs 1.519 μs

Test code:

void Main() => BenchmarkRunner.Run<CharsToString>();

public class CharsToString {
    private const int N = 10000;
    private readonly char[] data = new char[N];

    public CharsToString() {
        var random = new Random(42);
        for (var i = 0; i < data.Length; i++) {
            data[i] = (char)random.Next(0, 256);
        }
    }

    [Benchmark]
    public string StringFromArray()
        => new string(data.Where(char.IsLetterOrDigit).ToArray());

    [Benchmark]
    public string StringConcat()
        => string.Concat(data.Where(char.IsLetterOrDigit));

    [Benchmark]
    public string StringBuilder() {
        var sb = new StringBuilder();
        
        foreach (var c in data.Where(char.IsLetterOrDigit))
            sb.Append(c);
        
        return sb.ToString();
    }

    [Benchmark]
    public string StringBuilderAggregate() => data
        .Where(char.IsLetterOrDigit)
        .Aggregate(new StringBuilder(), (sb, c) => sb.Append(c))
        .ToString();
}
Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
1

This answer seeks to combine the following aspects of the already excellent answers provided.

  1. Readable
  2. Future proof / easy to refactor
  3. Fast

To do this an extension method on IEnumerable<char> is used.

public static string Join(this IEnumerable<char> chars)
{
#if NETCOREAPP2_1_OR_GREATER
    return String.Concat(chars);
#else
    var sb = new System.Text.StringBuilder();
    foreach (var c in chars)
    {
        sb.Append(c);
    }

    return sb.ToString();
#endif
}

This covers all the bases.

  1. It is very readable:

    var y = x.TakeWhile(char.IsLetter).Join();

  2. If there is a preferred new method in the future all conversions can be updated by changing one block of code.

  3. It supports the current best performing implementation based on the version of .NET currently being compiled.

Joshcodes
  • 8,513
  • 5
  • 40
  • 47