32

Reading a text file using streamreader.

using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
     string line = sr.ReadLine();
}

I want to force that line delimiter should be \n not \r. So how can i do that?

George Johnston
  • 31,652
  • 27
  • 127
  • 172
User13839404
  • 1,803
  • 12
  • 37
  • 46
  • 4
    According to the ReadLine documentation "A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n")" so it should be breaking at '\n'. If you want to do some sort of custom line parsing I think you will have to read each byte yourself and break where you want the "new line" to be. – pstrjds Jul 11 '11 at 19:22

9 Answers9

33

I would implement something like George's answer, but as an extension method that avoids loading the whole file at once (not tested, but something like this):

static class ExtensionsForTextReader
{
     public static IEnumerable<string> ReadLines (this TextReader reader, char delimiter)
     {
            List<char> chars = new List<char> ();
            while (reader.Peek() >= 0)
            {
                char c = (char)reader.Read ();

                if (c == delimiter) {
                    yield return new String(chars.ToArray());
                    chars.Clear ();
                    continue;
                }

                chars.Add(c);
            }
     }
}

Which could then be used like:

using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
     foreach (var line in sr.ReadLines ('\n'))
           Console.WriteLine (line);
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
Pete
  • 11,313
  • 4
  • 43
  • 54
  • Is there a specific '\r\n' solution possible with this code? – skmasq Mar 04 '14 at 22:26
  • @skmasq have you try Environment.NewLine https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx – Janne Harju Dec 25 '17 at 08:33
  • @JanneHarju: this would not work, because the `delimeter` argument is a character, and `"\r\n"` is a string composed of 2 characters. The function's algorithm must be adapted for searching a sequence of chars instead of a single char. – sɐunıɔןɐqɐp Aug 29 '20 at 09:22
21
string text = sr.ReadToEnd();
string[] lines = text.Split('\r');
foreach(string s in lines)
{
   // Consume
}
George Johnston
  • 31,652
  • 27
  • 127
  • 172
  • 40
    This is simple, but if the file contains 1 million lines this could end bad :) – pstrjds Jul 11 '11 at 19:25
  • 7
    Yes, and if it hypothetically contained 10, 100, 1,000, or 10,000 it wouldn't be. Every answer has a hypothetical downside. ;) – George Johnston Jul 11 '11 at 19:26
  • 2
    Right, I was adding the comment because in general if you are using a stream, then you are processing bytes a little at a time so that you don't have to load the whole file in memory (okay, maybe the "general" here is "general" for me). I tend to deal with large files and so loading the whole thing in memory can be a problem. – pstrjds Jul 11 '11 at 19:29
  • For Big files, you should go for Martin's answer – Adriano Carneiro Jul 11 '11 at 19:30
  • 3
    @pstrjds I understand that, but it really depends on his requirements. If this solution doesn't work because of memory limitations, he can easily add a bit of code to stream in chunks of data, and split as needed, e.g. `ReadBlock()`. I'll leave it at that. He doesn't need to accept this answer, but it may be useful to others who may not face the same limitations. :) – George Johnston Jul 11 '11 at 19:34
  • @Adrain: Unfortunately,Martin's answer is not a solution for me. I read that already before posting a question here.. – User13839404 Jul 11 '11 at 19:34
7

I loved the answer @Pete gave. I would just like to submit a slight modification. This will allow you to pass a string delimiter instead of just a single character:

using System;
using System.IO;
using System.Collections.Generic;
internal static class StreamReaderExtensions
{
    public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
    {
        List<char> buffer = new List<char>();
        CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
        while (reader.Peek() >= 0)
        {
            char c = (char)reader.Read();
            delim_buffer.Enqueue(c);
            if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
            {
                if (buffer.Count > 0)
                {
                    if (!reader.EndOfStream)
                    {
                        yield return new String(buffer.ToArray()).Replace(delimiter.Substring(0, delimiter.Length - 1), string.Empty);
                    }
                    else
                    {
                        buffer.Add(c);
                        yield return new String(buffer.ToArray());
                    }
                    buffer.Clear();
                }
                continue;
            }
            buffer.Add(c);
        }
    }

    private class CircularBuffer<T> : Queue<T>
    {
        private int _capacity;

        public CircularBuffer(int capacity)
            : base(capacity)
        {
            _capacity = capacity;
        }

        new public void Enqueue(T item)
        {
            if (base.Count == _capacity)
            {
                base.Dequeue();
            }
            base.Enqueue(item);
        }

        public override string ToString()
        {
            List<String> items = new List<string>();
            foreach (var x in this)
            {
                items.Add(x.ToString());
            };
            return String.Join("", items);
        }
    }
}
sovemp
  • 1,402
  • 1
  • 13
  • 31
  • Nice solution... one potential issue; doesn't this include all but the last delimiter char; i.e. if I have a 4 character delimiter, the returned string would still contain the first 3 characters of that delimiter. – JohnLBevan Dec 01 '16 at 11:02
  • @JohnLBevan yeah you are correct. I am trying to think of a good solution for this. I imagine you would probably want to throw away everything in the delimiter. – sovemp Dec 01 '16 at 23:19
  • @JohnLBevan updated so it should work now. I also noticed it threw away the very last entry that it should return, which should now also be fixed. – sovemp Dec 01 '16 at 23:32
  • Your yielded Replace is bad as it will remove some unwanted char you might still want. I my case I wanted to read by "\r\n" only and preserve the internal "/r" which the Replace/Substring ended scarping away. I ended yielding this way: var s = new String(buffer.ToArray()); yield return s.Substring(0, s.Length - delimiter.Length + 1); – VeV Mar 14 '17 at 10:57
  • Good Job Still ;) – VeV Mar 14 '17 at 10:58
  • uh, the idea is good, however parsing a big text file is supposed to be efficient (otherwise you're working with small text files) – Gregor y Mar 23 '21 at 17:08
5

This is an improvement of sovemp answer. Sorry I would have liked to comment, although my reputation doesn't allow me to do so. This improvement addresses 2 issues:

  1. example sequence "text\rtest\r\n" with delimiter "\r\n" would also delete the first "\r" which is not intended.
  2. when last characters in stream equals delimiter, function would wrongly return string including delimiters.

    using System;
    using System.IO;
    using System.Collections.Generic;
    internal static class StreamReaderExtensions
    {
        public static IEnumerable<string> ReadUntil(this StreamReader reader, string delimiter)
        {
            List<char> buffer = new List<char>();
            CircularBuffer<char> delim_buffer = new CircularBuffer<char>(delimiter.Length);
            while (reader.Peek() >= 0)
            {
                char c = (char)reader.Read();
                delim_buffer.Enqueue(c);
                if (delim_buffer.ToString() == delimiter || reader.EndOfStream)
                {
                    if (buffer.Count > 0)
                    {
                        if (!reader.EndOfStream)
                        {
                            buffer.Add(c);
                            yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
                        }
                        else
                        {
                            buffer.Add(c);
                            if (delim_buffer.ToString() != delimiter)
                                yield return new String(buffer.ToArray());
                            else
                                yield return new String(buffer.ToArray()).Substring(0, buffer.Count - delimeter.Length);
                        }
                        buffer.Clear();
                    }
                    continue;
                }
                buffer.Add(c);
            }
        }
    
        private class CircularBuffer<T> : Queue<T>
        {
            private int _capacity;
    
            public CircularBuffer(int capacity)
                : base(capacity)
            {
                _capacity = capacity;
            }
    
            new public void Enqueue(T item)
            {
                if (base.Count == _capacity)
                {
                    base.Dequeue();
                }
                base.Enqueue(item);
            }
    
            public override string ToString()
            {
                List<String> items = new List<string>();
                foreach (var x in this)
                {
                    items.Add(x.ToString());
                };
                return String.Join("", items);
            }
        }
    }
    
jp1980
  • 51
  • 1
  • 3
5

I needed a solution that reads until "\r\n", and does not stop at "\n". jp1980's solution worked, but was extremely slow on a large file. So, I converted Mike Sackton's solution to read until a specified string is found.

public static string ReadLine(this StreamReader sr, string lineDelimiter)
    {
        StringBuilder line = new StringBuilder();
        var matchIndex = 0;

        while (sr.Peek() > 0)
        {
            var nextChar = (char)sr.Read();
            line.Append(nextChar);

            if (nextChar == lineDelimiter[matchIndex])
            {
                if (matchIndex == lineDelimiter.Length - 1)
                {
                    return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
                }
                matchIndex++;
            }
            else
            {
                matchIndex = 0;
                //did we mistake one of the characters as the delimiter? If so let's restart our search with this character...
                if (nextChar == lineDelimiter[matchIndex])
                {
                    if (matchIndex == lineDelimiter.Length - 1)
                    {
                        return line.ToString().Substring(0, line.Length - lineDelimiter.Length);
                    }
                    matchIndex++;
                }
            }
        }

        return line.Length == 0
            ? null
            : line.ToString();
    }

And it is called like this...

using (StreamReader reader = new StreamReader(file))
{
    string line;
    while((line = reader.ReadLine("\r\n")) != null)
    {
        Console.WriteLine(line);
    }
}
Denis
  • 11,796
  • 16
  • 88
  • 150
William S.
  • 51
  • 1
  • 4
  • 2
    Perfect. Works with custom line delimiters like Environment.NewLine + "go" + Environment.NewLine; – D.G. Oct 23 '18 at 11:25
5

According to the documentation:

http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx

A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n").

By default the StreamReader ReadLine method will recognise a line by both/either \n or \r

Martin
  • 39,569
  • 20
  • 99
  • 130
4

You either have to parse the stream byte-by-byte yourself and handle the split, or you need to use the default ReadLine behavior which splits on /r, /n, or /r/n.

If you want to parse the stream byte-by-byte, I'd use something like the following extension method:

 public static string ReadToChar(this StreamReader sr, char splitCharacter)
    {        
        char nextChar;
        StringBuilder line = new StringBuilder();
        while (sr.Peek() > 0)
        {               
            nextChar = (char)sr.Read();
            if (nextChar == splitCharacter) return line.ToString();
            line.Append(nextChar);
        }

        return line.Length == 0 ? null : line.ToString();
    }
Mike Sackton
  • 1,094
  • 7
  • 19
1

Even though you said "Using StreamReader", since you also said "I my case, file can have tons of records...", I would recommend trying SSIS. It's perfect for what you're trying to do. You can process very large file and specify the line/column delimiters easily.

Tipx
  • 7,367
  • 4
  • 37
  • 59
  • 2
    Do you mean [Sql Server Integration Services](http://msdn.microsoft.com/en-us/library/ms141026.aspx)? That seems a bit overkill for this when you could do a simple brute force loop over each char and build lines that way? – pstrjds Jul 11 '11 at 19:38
  • @pstrjds : Yes, I did mean Sql Server Integration Services :-D Sure it might be overkill, but what triggered my suggestion is really the "tons of records" part. Sometimes, I have to "parse" csv files that have around 18M lines and a ton of columns (about 450megs) and I like using SSIS for this. Of course, my usage is related to a SQL server too, but I like the tool (Even though I don't like some of it's interfaces/behaviors.) – Tipx Jul 11 '11 at 21:34
  • 1
    @tipx full source code sample with good patterns and practices using ***SSIS*** for `read csv files` ? – Kiquenet Dec 04 '17 at 08:42
1

This code snippet will read a line from a file until it encounters "\n".

using (StreamReader sr = new StreamReader(path)) 
{
     string line = string.Empty;
     while (sr.Peek() >= 0) 
     {
          char c = (char)sr.Read();
          if (c == '\n')
          {
              //end of line encountered
              Console.WriteLine(line);
              //create new line
              line = string.Empty;
          }
          else
          {
               line += (char)sr.Read();
          }
     }
}

Because this code reads character by character it will work with a file of any length without being constrained by available memory.

Dan Waterbly
  • 850
  • 7
  • 15