1

I have an array of bytes (say byte[] data), which contains text with custom line delimiters, for example: "\r\n" (CRLF "\x0D\x0A"), "\r", "\n", "\x0D\x0A\x0D" or even "@".

At the moment I'm going to use the following solution:

  1. Normalize line breaks to CRLF (here is an example how to normalize CRLF What is a quick way to force CRLF in C# / .NET?)
  2. Use StringReader to read text line by line

    
    using (String Reader sr = new StringReader(data.ToString()))
    {
        string line;
        while ((line = sr.ReadLine()) != null)
        {
            // Process the line 
        }
    }
    

I'm using C#, .NET 3.5. Is there any better solution?

Thanks.

Community
  • 1
  • 1
Alpha Sisyphus
  • 1,508
  • 3
  • 19
  • 33
  • 1
    You should normalize your line endings to `\n`, which only takes 2 replacements instead of 3. Notice in your link that the answer first normalizes to `\n` and only after that changes `\n` to `\r\n`. – Sam Harwell Jan 02 '10 at 17:36
  • Yup, It turns out that I need to notmalize line endings to either of the following: "\r", "\n", "\r\n" http://msdn.microsoft.com/en-us/library/system.io.stringreader.readline.aspx A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n"). The resulting string does not contain the terminating carriage return and/or line feed. The returned value is a null reference (Nothing in Visual Basic) if the end of the underlying string has been reached. – Alpha Sisyphus Jan 02 '10 at 17:46
  • For fastest performance, you could split it by hand ... – Hamish Grubijan Jan 02 '10 at 18:39

2 Answers2

1

Here's one option to limit calls to string.Replace to just the multi-character delimiters.

private static readonly char[] DelimiterChars = { '\r', '\n', '@' };
private static readonly string[] DelimiterStrings = { "\r\n\r", "\r\n" };

Then later...

string text = Encoding.ASCII.GetString(data);
foreach (string delim in DelimiterStrings)
    text = text.Replace(delim, "\n");

foreach (string line in text.Split(DelimiterChars))
{
    // processing here
}
Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
0

Use regexp instead, which will give you much more flexibility.

Martin
  • 1
  • 1