1

I have a file, that contains JSON string. Long string. Approx 700k symbols.

I'm trying to deserialize it.

But it contains symbols like \r and \n that should be replaces with comma ,.

I've tried to do it with Regex, but it stuck on it without error.

private static readonly Regex Pattern = new Regex("(\r\n|\r|\n)", RegexOptions.Compiled | RegexOptions.IgnoreCase);

Pattern.Replace(dataString, ",");

Also tried to convert string into StringBuilder and use simple .Replace

private readonly IDictionary<string, string> replacements = new Dictionary<string, string> { { "\r\n", "," }, { "\r", "," }, { "\n", "," } };

foreach (var replacement in this.replacements)
{
     dataStringBuilder.Replace(replacement.Key, replacement.Value);
}

The second case was better but till the time when the file becomes larger. So now I receive stuck for both cases.

Are there any other recommended faster solutions?

demo
  • 6,038
  • 19
  • 75
  • 149

1 Answers1

2

You could use a naïve approach of manually copying the string, converting line breaks yourself. This enables you to iterate the underlying character array only once, and avoids costly reallocations of string/StringBuilder objects:

char[] converted = new char[input.Length];
int pos = 0;
bool lastWasCr = false;
foreach(char c in input)
{
    if(c == '\r')
    {
        converted[pos++] = ',';
        lastWasCr = true;
    }
    else
    {
        if(c == '\n')
        {
            if(!lastWasCr)
                converted[pos++] = ',';
        }
        else
            converted[pos++] = c;
        lastWasCr = false;
    }
}
string output = new string(converted, 0, pos);

This loop iterates over every character, and detects and replaces line breaks. Note that we have to keep track of recent carriage returns (\r), to avoid double , on Windows line breaks (\r\n).


I compared your two approaches with the code above, using a random 650kb text file, and performing 1000 iterations of each implementation.

Results:

  • Regex.Replace: 62.3233sec (this does not even include initialization like compiling the regex)
  • StringBuilder.Replace: 7.0622sec (fixed version as indicated in a comment to your question)
  • Char-wise loop with if statement: 2.3862sec
janw
  • 8,758
  • 11
  • 40
  • 62