11

an example (that might not be real life, but to make my point) :

public void StreamInfo(StreamReader p)
{
    string info = string.Format(
        "The supplied streamreaer read : {0}\n at line {1}",
        p.ReadLine(),
        p.GetLinePosition()-1);               

}

GetLinePosition here is an imaginary extension method of streamreader. Is this possible?

Of course I could keep count myself but that's not the question.

Peter
  • 47,963
  • 46
  • 132
  • 181

7 Answers7

28

I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:

var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();

streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();

Assert.AreEqual(line1, line2);

and the important part:

public static class StreamReaderExtensions
{
    readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);

    public static long GetPosition(this StreamReader reader)
    {
        // shift position back from BaseStream.Position by the number of bytes read
        // into internal buffer.
        int byteLen = (int)byteLenField.GetValue(reader);
        var position = reader.BaseStream.Position - byteLen;

        // if we have consumed chars from the buffer we need to calculate how many
        // bytes they represent in the current encoding and add that to the position.
        int charPos = (int)charPosField.GetValue(reader);
        if (charPos > 0)
        {
            var charBuffer = (char[])charBufferField.GetValue(reader);
            var encoding = reader.CurrentEncoding;
            var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
            position += bytesConsumed;
        }

        return position;
    }

    public static void SetPosition(this StreamReader reader, long position)
    {
        reader.DiscardBufferedData();
        reader.BaseStream.Seek(position, SeekOrigin.Begin);
    }
}

This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.

Caveats:

  1. While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
  2. I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
Muhammad Usman Bashir
  • 1,441
  • 2
  • 14
  • 43
Eamon
  • 1,829
  • 1
  • 20
  • 21
  • Works with `System.Text.Encoding.UTF8` – CrazyIvan1974 Jun 08 '17 at 09:44
  • 3
    You should add an underscore before that field names. [Net Core source code](https://source.dot.net/#System.Private.CoreLib/StreamReader.cs,b5fe1efcec14de32). – Ender Look Dec 24 '19 at 01:38
  • Tried this but whenever I do a readline with a streamreader I have to reset the position like so: SetPosition(reader, GetPosition(reader));, is there a better way? – John Ernest Feb 25 '20 at 01:14
  • Please note there may be some bytes already read from byte buffer but not converted to char buffer yet. e.g. first 2 bytes of a 4-byte UTF8 char is at the end of byte buffer. Those bytes are cached inside Decoder object in StreamReader's 'encoding' member var. – baldpate Jul 30 '21 at 07:47
  • https://stackoverflow.com/a/17457085/4250616 has a solution to handle those spitted char. – baldpate Jul 30 '21 at 07:50
11

No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.

The only way to do this is to keep track of it yourself.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
8

It is extremely easy to provide a line-counting wrapper for any TextReader:

public class PositioningReader : TextReader {
    private TextReader _inner;
    public PositioningReader(TextReader inner) {
        _inner = inner;
    }
    public override void Close() {
        _inner.Close();
    }
    public override int Peek() {
        return _inner.Peek();
    }
    public override int Read() {
        var c = _inner.Read();
        if (c >= 0)
            AdvancePosition((Char)c);
        return c;
    }

    private int _linePos = 0;
    public int LinePos { get { return _linePos; } }

    private int _charPos = 0;
    public int CharPos { get { return _charPos; } }

    private int _matched = 0;
    private void AdvancePosition(Char c) {
        if (Environment.NewLine[_matched] == c) {
            _matched++;
            if (_matched == Environment.NewLine.Length) {
                _linePos++;
                _charPos = 0;
                _matched = 0;
            }
        }
        else {
            _matched = 0;
            _charPos++;
        }
    }
}

Drawbacks (for the sake of brevity):

  1. Does not check constructor argument for null
  2. Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
  3. Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
    • overriding those methods via routing calls to _inner. instead of base.
    • passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:

public override int ReadBlock(char[] buffer, int index, int count) {
    var readCount = _inner.ReadBlock(buffer, index, count);    
    for (int i = 0; i < readCount; i++)
        AdvancePosition(buffer[index + i]);
    return readCount;
}
Sinclair
  • 96
  • 1
  • 1
5

No.

Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line). Now consider what that would do to any count kept by the StreamReader.

Should the StreamReader go and figure out which line it's now on? Should it just keep a number of lines read, regardless of position within the file?

There are more questions than just these that would make this a nightmare to implement, imho.

Binary Worrier
  • 50,774
  • 20
  • 136
  • 184
  • On the other hand, should we gain rep for repeating what's already been stated? (NOT saying this poster did, but in general it'd certainly be possible!) – The Dag Jan 23 '12 at 12:54
  • @The Dag: Not so much repeating, as said at the same time . . . JINX! (BTW, di ya wanna buy a dag?) – Binary Worrier Jan 24 '12 at 07:40
3

Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.

http://www.daniweb.com/forums/thread35078.html

I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):

 // Reads a line. A line is defined as a sequence of characters followed by
 // a carriage return ('\r'), a line feed ('\n'), or a carriage return
 // immediately followed by a line feed. The resulting string does not
 // contain the terminating carriage return and/or line feed. The returned
 // value is null if the end of the input stream has been reached.
 //
 /// <include file='doc\myStreamReader.uex' path='docs/doc[@for="myStreamReader.ReadLine"]/*' />
 public override String ReadLine()
 {
          _lineLength = 0;
          //if (stream == null)
          //       __Error.ReaderClosed();
          if (charPos == charLen)
          {
                   if (ReadBuffer() == 0) return null;
          }
          StringBuilder sb = null;
          do
          {
                   int i = charPos;
                   do
                   {
                           char ch = charBuffer[i];
                           int EolChars = 0;
                           if (ch == '\r' || ch == '\n')
                           {
                                    EolChars = 1;
                                    String s;
                                    if (sb != null)
                                    {
                                             sb.Append(charBuffer, charPos, i - charPos);
                                             s = sb.ToString();
                                    }
                                    else
                                    {
                                             s = new String(charBuffer, charPos, i - charPos);
                                    }
                                    charPos = i + 1;
                                    if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
                                    {
                                             if (charBuffer[charPos] == '\n')
                                             {
                                                      charPos++;
                                                      EolChars = 2;
                                             }
                                    }
                                    _lineLength = s.Length + EolChars;
                                    _bytesRead = _bytesRead + _lineLength;
                                    return s;
                           }
                           i++;
                   } while (i < charLen);
                   i = charLen - charPos;
                   if (sb == null) sb = new StringBuilder(i + 80);
                   sb.Append(charBuffer, charPos, i);
          } while (ReadBuffer() > 0);
          string ss = sb.ToString();
          _lineLength = ss.Length;
          _bytesRead = _bytesRead + _lineLength;
          return ss;
 }

Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).

Rolf Kristensen
  • 17,785
  • 1
  • 51
  • 70
2

I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader

class CountingReader : StreamReader {
    private int _lineNumber = 0;
    public int LineNumber { get { return _lineNumber; } }

    public CountingReader(Stream stream) : base(stream) { }

    public override string ReadLine() {
        _lineNumber++;
        return base.ReadLine();
    }
}

and then you make it the normal way, say from a FileInfo object named file

CountingReader reader = new CountingReader(file.OpenRead())

and you just read the reader.LineNumber property.

Andy Hubbard
  • 118
  • 8
  • 1
    Good answer, but you should clarify that this will only work if `ReadLine` is the _only_ method you are calling. – John Saunders Feb 18 '14 at 23:03
  • This is a solution that just begs to cause bugs down the line. Someone will later use Seek method or whatever and it will no longer work (or you pass it to some method that uses method other than ReadLine). If the class is capable of only using ReadLine to correctly, it shouldn't inherit from StreamReader. I am pretty sure this violates Liskov substitution principle. – jahav Feb 01 '19 at 13:32
  • @jahav You're entire argument is wrong. Seek() works correctly. Everything functions exactly the same except if you only use ReadLine() you can see the LineNumber. If the code is only expecting a StreamReader, then it doesn't know about LineNumber and everything functions normally. If you know you have a CountingReader, not just a StreamReader, then you know the limitations (only use ReadLine) so you use it accordingly. This is meant for a specific use and you should understand and use your tools appropriately. – Andy Hubbard Mar 16 '19 at 18:59
  • @Andy Hubbard Seek will work correctly, but LineNumber property won't work correctly after Seek or any other method of StreamReader moving in the stream other than ReadLine. That is the problem, therefore you shouldn't inherit from StreamReader. – jahav Mar 17 '19 at 19:20
  • @jahav That's the point. If you know you have a CountingReader you know you have access to LineNumber and you also know you can't use Seek() or anything but ReadLine(). If you only know you have a StreamReader then you don't know about LineNumber so you don't use it. I see and understand your implied point about using a private StreamReader instead of inheriting but sometimes you need the polymorphism granted by subclassing. – Andy Hubbard Mar 23 '19 at 19:01
  • The scenario here is "I have a flat file of some kind and I want to read each line and know which line I am on but the methods I am calling expect a StreamReader and I know these methods are only using ReadLine() because I have done my research and understand the tools I am using." If you care that much about making sure future developers don't screw up or that you didn't bother to make sure you're only using ReadLine() then just override the other methods and throw a NotImplementedException so your code crashes during testing and you can fix the problem before release. – Andy Hubbard Mar 23 '19 at 19:01
  • If you want to, go ahead and spend three weeks implementing a bunch of code that will never be used to achieve some kind of academic purity instead of actually accomplishing your task in three hours. Just don't be surprised when you're fired. – Andy Hubbard Mar 23 '19 at 20:27
1

The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.

I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).

I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.

I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.

The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".

Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.

Enough babble, here's the code:


public class StreamReaderEx : StreamReader
{
    // NewLine characters (magic value -1: "not used").
    int newLine1, newLine2;

    // The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
    bool insideNewLine;

    // StringBuilder used for ReadLine implementation.
    StringBuilder lineBuilder = new StringBuilder();


    public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
    {
        init(newLine);
    }


    public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
    {
        init(newLine);
    }


    public string NewLine
    {
        get { return "" + (char)newLine1 + (char)newLine2; }
        private set
        {
            Guard.NotNull(value, "value");
            Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");

            newLine1 = value[0];
            newLine2 = (value.Length == 2 ? value[1] : -1);
        }
    }


    public int LineNumber { get; private set; }
    public int LinePosition { get; private set; }


    public override int Read()
    {
        int next = base.Read();
        trackTextPosition(next);
        return next;
    }


    public override int Read(char[] buffer, int index, int count)
    {
        int n = base.Read(buffer, index, count);
        for (int i = 0; i 
The Dag
  • 1,811
  • 16
  • 22
  • Oh great, my code was just cut-off in the middle. I'll take the opportunity to see if anyone's interested; if so, let me know and I'll post the remainder. – The Dag Jan 23 '12 at 13:05