4

I have a char[] salary which contains data that comes from a string. I want to convert char[] salary to float, but it seems to be extremelly slow by the method I'm trying, which is:

float ff = float.Parse(new string(salary));

According to Visual Studio's Performance Profiler this is taking way too much processing:

enter image description here

So I'd like to know if there's a faster way to do this, Since performance here is a point. The char[] is formated like so:

[ '1', '3', '2', ',', '2', '9']

And is basically a JSON-like float converted to every digit (and comma) fit into a char[].

EDIT:

I've reformatted the code and it seems like the performance hit is actually in the conversion from char[] to string, not the parsing from string to float.

Washington A. Ramos
  • 874
  • 1
  • 8
  • 25
  • Is it coming from a user or system? If it's a system you can supply the culture which can speed up float.parse. eg float numFloat = float.Parse( System.Globalization.CultureInfo.InvariantInfo, strFloat ); – Glenn Watson Jul 29 '18 at 20:13
  • Just wondering why does it come as a char array and not a string. The profiler value doesn’t give much information without what else is happening here. 30% of a millisecond isn’t much for example. Is it really a performance issue? – Sami Kuhmonen Jul 29 '18 at 20:14
  • Information comes from a .json file. Then, it's read into a byte[], then the section of the byte[] that represents the float is extracted into a char[]. Performance is an issue because I'm dealing with 30 million+ entries. – Washington A. Ramos Jul 29 '18 at 20:16
  • Do you need to parse a very large json file? And you made a bike that reads the file in parts to a byte array and then parses it? Try using a streaming `JsonTextReader`. – Alexander Petrov Jul 29 '18 at 20:37
  • I'm manually parsing it because when I tried JsonTextReader the program would take 50 seconds to run, while right now I'm doing it in 12s~ with a custom parser. The Deserializer method in JsonNet wouldn't even run the program. – Washington A. Ramos Jul 29 '18 at 20:40
  • if you bothered to write a custom json parser why not also write a custom parse method for your char[] format? – Chris Rollins Jul 29 '18 at 20:46
  • Is it known how much `new string(salary)` contributes to the time spent on that line? Converting from `byte[]` → `string`, if possible, without the intermediate `char[]` seems like it would improve performance. – Lance U. Matthews Jul 29 '18 at 20:50
  • 1
    Did you measure the time in the compiler _release_ mode? Using the _debug_ mode can lead to huge time differences. – martinstoeckli Jul 29 '18 at 21:28

4 Answers4

7

Since this question has changed from "What's the fastest way to parse a float?" to "What's the fastest way to get a string from a char[]?", I wrote some benchmarks with BenchmarkDotNet to compare the various methods. My finding is that, if you already have a char[], you can't get any faster than just passing it to the string(char[]) constructor like you're already doing.

You say that your input file is "read into a byte[], then the section of the byte[] that represents the float is extracted into a char[]." Since you have the bytes that make up the float text isolated in a byte[], perhaps you can improve performance by skipping the intermediate char[]. Assuming you have something equivalent to...

byte[] floatBytes = new byte[] { 0x31, 0x33, 0x32, 0x2C, 0x32, 0x39 }; // "132,29"

...you could use Encoding.GetString()...

string floatString = Encoding.ASCII.GetString(floatBytes);

...which is nearly twice as fast as passing the result of Encoding.GetChars() to the string(char[]) constructor...

char[] floatChars = Encoding.ASCII.GetChars(floatBytes);
string floatString = new string(floatChars);

You'll find those benchmarks listed last in my results...

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
  [Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
  Core   : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT


                                               Method | Runtime |       Categories |      Mean | Scaled |
----------------------------------------------------- |-------- |----------------- |----------:|-------:|
                         String_Constructor_CharArray |     Clr | char[] => string |  13.51 ns |   1.00 |
                                        String_Concat |     Clr | char[] => string | 192.87 ns |  14.27 |
 StringBuilder_Local_AppendSingleChar_DefaultCapacity |     Clr | char[] => string |  60.74 ns |   4.49 |
   StringBuilder_Local_AppendSingleChar_ExactCapacity |     Clr | char[] => string |  60.26 ns |   4.46 |
   StringBuilder_Local_AppendAllChars_DefaultCapacity |     Clr | char[] => string |  51.27 ns |   3.79 |
     StringBuilder_Local_AppendAllChars_ExactCapacity |     Clr | char[] => string |  49.51 ns |   3.66 |
                 StringBuilder_Field_AppendSingleChar |     Clr | char[] => string |  51.14 ns |   3.78 |
                   StringBuilder_Field_AppendAllChars |     Clr | char[] => string |  32.95 ns |   2.44 |
                                                      |         |                  |           |        |
                       String_Constructor_CharPointer |     Clr |  void* => string |  29.28 ns |   1.00 |
                      String_Constructor_SBytePointer |     Clr |  void* => string |  89.21 ns |   3.05 |
                   UnsafeArrayCopy_String_Constructor |     Clr |  void* => string |  42.82 ns |   1.46 |
                                                      |         |                  |           |        |
                                   Encoding_GetString |     Clr | byte[] => string |  37.33 ns |   1.00 |
                 Encoding_GetChars_String_Constructor |     Clr | byte[] => string |  60.83 ns |   1.63 |
                     SafeArrayCopy_String_Constructor |     Clr | byte[] => string |  27.55 ns |   0.74 |
                                                      |         |                  |           |        |
                         String_Constructor_CharArray |    Core | char[] => string |  13.27 ns |   1.00 |
                                        String_Concat |    Core | char[] => string | 172.17 ns |  12.97 |
 StringBuilder_Local_AppendSingleChar_DefaultCapacity |    Core | char[] => string |  58.68 ns |   4.42 |
   StringBuilder_Local_AppendSingleChar_ExactCapacity |    Core | char[] => string |  59.85 ns |   4.51 |
   StringBuilder_Local_AppendAllChars_DefaultCapacity |    Core | char[] => string |  40.62 ns |   3.06 |
     StringBuilder_Local_AppendAllChars_ExactCapacity |    Core | char[] => string |  43.67 ns |   3.29 |
                 StringBuilder_Field_AppendSingleChar |    Core | char[] => string |  54.49 ns |   4.11 |
                   StringBuilder_Field_AppendAllChars |    Core | char[] => string |  31.05 ns |   2.34 |
                                                      |         |                  |           |        |
                       String_Constructor_CharPointer |    Core |  void* => string |  22.87 ns |   1.00 |
                      String_Constructor_SBytePointer |    Core |  void* => string |  83.11 ns |   3.63 |
                   UnsafeArrayCopy_String_Constructor |    Core |  void* => string |  35.30 ns |   1.54 |
                                                      |         |                  |           |        |
                                   Encoding_GetString |    Core | byte[] => string |  36.19 ns |   1.00 |
                 Encoding_GetChars_String_Constructor |    Core | byte[] => string |  58.99 ns |   1.63 |
                     SafeArrayCopy_String_Constructor |    Core | byte[] => string |  27.81 ns |   0.77 |

...from running this code (requires BenchmarkDotNet assembly and compiling with /unsafe)...

using System;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;

namespace StackOverflow_51584129
{
    [CategoriesColumn()]
    [ClrJob()]
    [CoreJob()]
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StringCreationBenchmarks
    {
        private static readonly Encoding InputEncoding = Encoding.ASCII;

        private const string InputString = "132,29";
        private static readonly byte[] InputBytes = InputEncoding.GetBytes(InputString);
        private static readonly char[] InputChars = InputString.ToCharArray();
        private static readonly sbyte[] InputSBytes = InputBytes.Select(Convert.ToSByte).ToArray();

        private GCHandle _inputBytesHandle;
        private GCHandle _inputCharsHandle;
        private GCHandle _inputSBytesHandle;

        private StringBuilder _builder;

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("char[] => string")]
        public string String_Constructor_CharArray()
        {
            return new string(InputChars);
        }

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("void* => string")]
        public unsafe string String_Constructor_CharPointer()
        {
            var pointer = (char*) _inputCharsHandle.AddrOfPinnedObject();

            return new string(pointer);
        }

        [Benchmark()]
        [BenchmarkCategory("void* => string")]
        public unsafe string String_Constructor_SBytePointer()
        {
            var pointer = (sbyte*) _inputSBytesHandle.AddrOfPinnedObject();

            return new string(pointer);
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string String_Concat()
        {
            return string.Concat(InputChars);
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendSingleChar_DefaultCapacity()
        {
            var builder = new StringBuilder();

            foreach (var c in InputChars)
                builder.Append(c);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendSingleChar_ExactCapacity()
        {
            var builder = new StringBuilder(InputChars.Length);

            foreach (var c in InputChars)
                builder.Append(c);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendAllChars_DefaultCapacity()
        {
            var builder = new StringBuilder().Append(InputChars);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Local_AppendAllChars_ExactCapacity()
        {
            var builder = new StringBuilder(InputChars.Length).Append(InputChars);

            return builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Field_AppendSingleChar()
        {
            _builder.Clear();

            foreach (var c in InputChars)
                _builder.Append(c);

            return _builder.ToString();
        }

        [Benchmark()]
        [BenchmarkCategory("char[] => string")]
        public string StringBuilder_Field_AppendAllChars()
        {
            return _builder.Clear().Append(InputChars).ToString();
        }

        [Benchmark(Baseline = true)]
        [BenchmarkCategory("byte[] => string")]
        public string Encoding_GetString()
        {
            return InputEncoding.GetString(InputBytes);
        }

        [Benchmark()]
        [BenchmarkCategory("byte[] => string")]
        public string Encoding_GetChars_String_Constructor()
        {
            var chars = InputEncoding.GetChars(InputBytes);

            return new string(chars);
        }

        [Benchmark()]
        [BenchmarkCategory("byte[] => string")]
        public string SafeArrayCopy_String_Constructor()
        {
            var chars = new char[InputString.Length];

            for (int i = 0; i < InputString.Length; i++)
                chars[i] = Convert.ToChar(InputBytes[i]);

            return new string(chars);
        }

        [Benchmark()]
        [BenchmarkCategory("void* => string")]
        public unsafe string UnsafeArrayCopy_String_Constructor()
        {
            fixed (char* chars = new char[InputString.Length])
            {
                var bytes = (byte*) _inputBytesHandle.AddrOfPinnedObject();

                for (int i = 0; i < InputString.Length; i++)
                    chars[i] = Convert.ToChar(bytes[i]);

                return new string(chars);
            }
        }

        [GlobalSetup(Targets = new[] { nameof(StringBuilder_Field_AppendAllChars), nameof(StringBuilder_Field_AppendSingleChar) })]
        public void SetupStringBuilderField()
        {
            _builder = new StringBuilder();
        }

        [GlobalSetup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
        public void SetupBytesHandle()
        {
            _inputBytesHandle = GCHandle.Alloc(InputBytes, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
        public void CleanupBytesHandle()
        {
            _inputBytesHandle.Free();
        }

        [GlobalSetup(Target = nameof(String_Constructor_CharPointer))]
        public void SetupCharsHandle()
        {
            _inputCharsHandle = GCHandle.Alloc(InputChars, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(String_Constructor_CharPointer))]
        public void CleanupCharsHandle()
        {
            _inputCharsHandle.Free();
        }

        [GlobalSetup(Target = nameof(String_Constructor_SBytePointer))]
        public void SetupSByteHandle()
        {
            _inputSBytesHandle = GCHandle.Alloc(InputSBytes, GCHandleType.Pinned);
        }

        [GlobalCleanup(Target = nameof(String_Constructor_SBytePointer))]
        public void CleanupSByteHandle()
        {
            _inputSBytesHandle.Free();
        }

        public static void Main(string[] args)
        {
            BenchmarkDotNet.Running.BenchmarkRunner.Run<StringCreationBenchmarks>();
        }
    }
}
Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
  • Nice work! Will have to remember this benchmark library. – martinstoeckli Jul 30 '18 at 18:25
  • Thanks. This was my first time trying it, or any benchmark library, really. It's nice that it's pretty simple to get started with and it enables you to write benchmark methods that are nothing but the code to be tested. On the other hand, I did find that it's very powerful (complicated), too, and the documentation is somewhat lacking, so trying to figure out what controls what and how to change things from the defaults really was becoming quite a chore (especially since each run takes a while to see the result of whatever was just changed) so I eventually gave up on fiddling with it. – Lance U. Matthews Jul 30 '18 at 18:46
5

On the float-parsing side of things, there are some gains to be had based on which overload of float.Parse() you call and what you pass to it. I ran some more benchmarks comparing these overloads (note that I changed the decimal separator character from ',' to '.' just so I could specify CultureInfo.InvariantCulture).

For example, calling an overload that takes an IFormatProvider is good for about a 10% performance increase. Specifying NumberStyles.Float ("lax") for the NumberStyles parameter effects a change in performance of about a percentage point in either direction, and, making some assumptions about our input data, specifying only NumberStyles.AllowDecimalPoint ("strict") nets a few points performance increase. (The float.Parse(string) overload uses NumberStyles.Float | NumberStyles.AllowThousands.)

On the subject of making assumptions about your input data, if you know the text you're working with has certain characteristics (single-byte character encoding, no invalid numbers, no negatives, no exponents, no need to handle NaN or positive/negative infinity, etc.) you might do well to parse from the bytes directly and forego any unneeded special case handling and error checking. I included a very simple implementation in my benchmarks and it was able to get a float from a byte[] more than 16x faster than float.Parse(string) could get a float from a string!

Here are my benchmark results...

BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
  [Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
  Core   : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT


                                                        Method | Runtime |       Mean | Scaled |
-------------------------------------------------------------- |-------- |-----------:|-------:|
                                           float.Parse(string) |     Clr | 145.098 ns |   1.00 |
                        'float.Parse(string, IFormatProvider)' |     Clr | 134.191 ns |   0.92 |
                     'float.Parse(string, NumberStyles) [Lax]' |     Clr | 145.884 ns |   1.01 |
                  'float.Parse(string, NumberStyles) [Strict]' |     Clr | 139.417 ns |   0.96 |
    'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' |     Clr | 133.800 ns |   0.92 |
 'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' |     Clr | 127.413 ns |   0.88 |
                       'Custom byte-to-float parser [Indexer]' |     Clr |   7.657 ns |   0.05 |
                    'Custom byte-to-float parser [Enumerator]' |     Clr | 566.440 ns |   3.90 |
                                                               |         |            |        |
                                           float.Parse(string) |    Core | 154.369 ns |   1.00 |
                        'float.Parse(string, IFormatProvider)' |    Core | 138.668 ns |   0.90 |
                     'float.Parse(string, NumberStyles) [Lax]' |    Core | 155.644 ns |   1.01 |
                  'float.Parse(string, NumberStyles) [Strict]' |    Core | 150.221 ns |   0.97 |
    'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' |    Core | 142.591 ns |   0.92 |
 'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' |    Core | 135.000 ns |   0.87 |
                       'Custom byte-to-float parser [Indexer]' |    Core |  12.673 ns |   0.08 |
                    'Custom byte-to-float parser [Enumerator]' |    Core | 584.236 ns |   3.78 |

...from running this code (requires BenchmarkDotNet assembly)...

using System;
using System.Globalization;
using BenchmarkDotNet.Attributes;

namespace StackOverflow_51584129
{
    [ClrJob()]
    [CoreJob()]
    public class FloatParsingBenchmarks
    {
        private const string InputString = "132.29";
        private static readonly byte[] InputBytes = System.Text.Encoding.ASCII.GetBytes(InputString);

        private static readonly IFormatProvider ParsingFormatProvider = CultureInfo.InvariantCulture;
        private const NumberStyles LaxParsingNumberStyles = NumberStyles.Float;
        private const NumberStyles StrictParsingNumberStyles = NumberStyles.AllowDecimalPoint;
        private const char DecimalSeparator = '.';

        [Benchmark(Baseline = true, Description = "float.Parse(string)")]
        public float SystemFloatParse()
        {
            return float.Parse(InputString);
        }

        [Benchmark(Description = "float.Parse(string, IFormatProvider)")]
        public float SystemFloatParseWithProvider()
        {
            return float.Parse(InputString, CultureInfo.InvariantCulture);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles) [Lax]")]
        public float SystemFloatParseWithLaxNumberStyles()
        {
            return float.Parse(InputString, LaxParsingNumberStyles);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles) [Strict]")]
        public float SystemFloatParseWithStrictNumberStyles()
        {
            return float.Parse(InputString, StrictParsingNumberStyles);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Lax]")]
        public float SystemFloatParseWithLaxNumberStylesAndProvider()
        {
            return float.Parse(InputString, LaxParsingNumberStyles, ParsingFormatProvider);
        }

        [Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Strict]")]
        public float SystemFloatParseWithStrictNumberStylesAndProvider()
        {
            return float.Parse(InputString, StrictParsingNumberStyles, ParsingFormatProvider);
        }

        [Benchmark(Description = "Custom byte-to-float parser [Indexer]")]
        public float CustomFloatParseByIndexing()
        {
            // FOR DEMONSTRATION PURPOSES ONLY!
            // This code has been written for and only tested with
            // parsing the ASCII string "132.29" in byte form
            var currentIndex = 0;
            var boundaryIndex = InputBytes.Length;
            char currentChar;
            var wholePart = 0;

            while (currentIndex < boundaryIndex && (currentChar = (char) InputBytes[currentIndex++]) != DecimalSeparator)
            {
                var currentDigit = currentChar - '0';

                wholePart = 10 * wholePart + currentDigit;
            }

            var fractionalPart = 0F;
            var nextFractionalDigitScale = 0.1F;

            while (currentIndex < boundaryIndex)
            {
                currentChar = (char) InputBytes[currentIndex++];
                var currentDigit = currentChar - '0';

                fractionalPart += currentDigit * nextFractionalDigitScale;
                nextFractionalDigitScale *= 0.1F;
            }

            return wholePart + fractionalPart;
        }

        [Benchmark(Description = "Custom byte-to-float parser [Enumerator]")]
        public float CustomFloatParseByEnumerating()
        {
            // FOR DEMONSTRATION PURPOSES ONLY!
            // This code has been written for and only tested with
            // parsing the ASCII string "132.29" in byte form
            var wholePart = 0;
            var enumerator = InputBytes.GetEnumerator();

            while (enumerator.MoveNext())
            {
                var currentChar = (char) (byte) enumerator.Current;

                if (currentChar == DecimalSeparator)
                    break;

                var currentDigit = currentChar - '0';
                wholePart = 10 * wholePart + currentDigit;
            }

            var fractionalPart = 0F;
            var nextFractionalDigitScale = 0.1F;

            while (enumerator.MoveNext())
            {
                var currentChar = (char) (byte) enumerator.Current;
                var currentDigit = currentChar - '0';

                fractionalPart += currentDigit * nextFractionalDigitScale;
                nextFractionalDigitScale *= 0.1F;
            }

            return wholePart + fractionalPart;
        }

        public static void Main()
        {
            BenchmarkDotNet.Running.BenchmarkRunner.Run<FloatParsingBenchmarks>();
        }
    }
}
Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
  • One note about your bench marking, I'm not sure what his target .NET version is. .NET core can have different characteristics than .NET 4.7.2 in some scenarios. Dot Net Benchmark you can ask it to do a number of framework versions at once. – Glenn Watson Aug 01 '18 at 16:58
  • @GlennWatson Yeah, I had configured it to benchmark both .NET Framework and Core, but the results for Framework all came back as `NA`. At that point I didn't feel like fooling with it any more (see my comment on my other answer) and, as you noted, I didn't know that the author _wasn't_ using Core, so I left the Framework numbers as an exercise for the author/reader. Perhaps I'll give that another try to fill my answers in with more data. – Lance U. Matthews Aug 01 '18 at 17:21
  • Per [the FAQ](https://benchmarkdotnet.org/articles/faq.html) I just needed to make some minor edits to my project file for the .NET Framework benchmarks to work. I have updated the answer with new benchmark results for both Framework and Core. – Lance U. Matthews Aug 01 '18 at 21:17
  • Nice work. To get sign in, I used: float sgn = 1; while (currentIndex < boundaryIndex && (currentChar = (char)InputBytes[currentIndex++]) != DecimalSeparator) { if (currentChar == '-') sgn *= -1; else { var currentDigit = currentChar - '0'; wholePart = 10 * wholePart + currentDigit; } } ..... and multiply the result by sgn at the end ..... – Goodies Apr 04 '20 at 15:33
2

After some experiments and the tests from this:

The fastest way to have string from char[] is using new string

One more attention FYI, following this article of Microsoft in the case of invalid input, TryParse is the fastest way to parse float. So, think about it..

TryParse is only taking .5% of execution time Parse is taking 18% while Convert is taking 14%

Antoine V
  • 6,998
  • 2
  • 11
  • 34
  • 1
    Your quote is taken out of context. The sentence before is "Shown below you can see the huge difference between TryParse, Parse and ConvertTo **when you are using bad data**." (emphasis mine) and the sentence after is "The difference is, as we guessed, in the exception handling code." Thus, this just shows that `TryParse` is much faster at returning `false` than `Parse` is at throwing an exception, which is already known. Also, the numbers in the article are for parsing `int`s, not `float`s. – Lance U. Matthews Jul 29 '18 at 20:44
  • If performance is such an issue, building a list only to feed its items to a string builder is a bit of a waste, don't you think? – martinstoeckli Jul 29 '18 at 20:56
  • The experiments are already taken. If don't use StringBuilder, is there a better way to have string from char[]? I don't think so – Antoine V Jul 29 '18 at 20:58
  • how does String.Join compare to StringBuilder in this case? – Chris Rollins Jul 29 '18 at 21:05
  • 1
    The [other answer to that same question](https://stackoverflow.com/a/14817945/150605) claims "if you were to actually have a char array even if you are passed it as an IEnumerable it is faster to call the string constructor" and the test results show that the `string` constructor, even with the unnecessary call to `.ToArray()` on a `char[]`, is much faster than using a `StringBuilder`. It's significant that the other question is asking about making a `string` from an `IEnumerable` whereas in this question we have a `char[]`. – Lance U. Matthews Jul 29 '18 at 21:10
  • check it https://stackoverflow.com/a/33142321/6230863 , 1000000 iterations of "Concat" took 1597ms. 1000000 iterations of "new string" took 869ms. 1000000 iterations of "StringBuilder" took 748ms.SO I always support StringBuilder – Antoine V Jul 29 '18 at 21:11
  • 1
    That answer is also talking about creating a `string` from an `IEnumerable`. In this question we don't have an `IEnumerable`, we have something better: a `char[]`, where the length is known up front. (And, yes, of course `char[]` implements `IEnumberable`.) You can't take numbers from testing an interface and present them as holding true for all types that implement that interface. You need to compare `new string(char[])` vs. `StringBuilder.Append(char)` when operating on a **`char[]`**, not an `IEnumerable` that requires a call to the `.ToArray()` extension method. – Lance U. Matthews Jul 29 '18 at 21:35
  • 1
    The StringBuilder even accepts an array of char, so `sb.Append(a);` would be faster. But I find it hard to believe that the string constructor shouldn't be optimized. Maybe its time to do your own tests, considering BACON's objections and running them with compiler mode "release". – martinstoeckli Jul 29 '18 at 21:47
  • 2
    Please see my comments on [the linked answer](https://stackoverflow.com/a/14780004/6230863). Though it reaches the correct conclusion, the code is buggy and the results (quoted here) are invalid, plus it's not testing the same scenario, anyways. No offense, but this is why it's important to look at the context of data and where it came from before citing it. `StringBuilder` may very well be the fastest way to get a `string` from a `char[]`, but I have downvoted this answer because three times it and its supporting comments have quoted or linked to misleading information, which is not helpful. – Lance U. Matthews Jul 30 '18 at 00:27
2

Interesting topic for working out optimization details at home :) good health to you all..

My goal was: convert an Ascii CSV matrix into a float matrix as fast as possible in C#. For this purpose, it turns out string.Split() rows and converting each term separately will also introduce overhead. To overcome this, I modified BACON's solution for row parsing my floats, to use it like:

  var falist = new List<float[]>();
  for (int row=0; row<sRowList.Count; row++)
  {
    var sRow = sRowList[row];
    falist.Add(CustomFloatParseRowByIndexing(nTerms, sRow.ToCharArray(), '.'));
  }

Code for my row parser variant is below. These are benchmark results, converting a 40x31 matrix 1000x:

Benchmark0: Split row and Parse each term to convert to float matrix dT=704 ms

Benchmark1: Split row and TryParse each term to convert to float matrix dT=640 ms

Benchmark2: Split row and CustomFloatParseByIndexing to convert terms to float matrix dT=211 ms

Benchmark3: Use CustomFloatParseRowByIndexing to convert rows to float matrix dT=120 ms

public float[] CustomFloatParseRowByIndexing(int nItems, char[] InputBytes, char   DecimalSeparator)
{
// Convert semicolon-separated floats from InputBytes into nItems float[] result.
// Constraints are:
//   - no scientific notation or .x allowed
//   - every row has exactly nItems values
//   - semicolon delimiter after each value
//   - terms 'u'  or 'undef' or 'undefined' allowed for bad values
//   - minus sign allowed
//   - leading space allowed
//   - all terms must comply

// FOR DEMO PURPOSE ONLY
// based on BACON on Stackoverflow, modified to read nItems delimited float values
// https://stackoverflow.com/questions/51584129/convert-a-float-formated-char-to-float

var currentIndex = 0;
var boundaryIndex = InputBytes.Length;
bool termready, ready = false;
float[] result = new float[nItems];
int cItem = 0;
while (currentIndex < boundaryIndex)
{
    termready = false;
    if ((char)InputBytes[currentIndex] == ' ') { currentIndex++; continue; }
    char currentChar;
    var wholePart = 0;
    float sgn = 1;
    while (currentIndex < boundaryIndex && (currentChar = (char)InputBytes[currentIndex++]) != DecimalSeparator)
    {
        if (currentChar == 'u')
        {
            while ((char)InputBytes[currentIndex++] != ';') ;
            result[cItem++] = -9999.0f;
            continue;
        }
        else
        if (currentChar == ' ')
        {                       
            continue;
        }
        else
        if (currentChar == ';')
        {
            termready = true;
            break;
        }
        else
        if (currentChar == '-') sgn = -1;
        else
        {
            var currentDigit = currentChar - '0';
            wholePart = 10 * wholePart + currentDigit;
        }
    }
    var fractionalPart = 0F;
    var nextFractionalDigitScale = 0.1F;
    if (!termready)
        while (currentIndex < boundaryIndex)
        {
            currentChar = (char)InputBytes[currentIndex++];
            if (currentChar == ';')
            {
                termready = true;
                break;
            }
            var currentDigit = currentChar - '0';
            fractionalPart += currentDigit * nextFractionalDigitScale;
            nextFractionalDigitScale *= 0.1F;
        }
    if (termready) 
    { 
      result[cItem++] = sgn * (wholePart + fractionalPart); 
    }
  }   
  return result;
}
Goodies
  • 1,951
  • 21
  • 26