22

Can you use StreamReader to read a normal textfile and then in the middle of reading close the StreamReader after saving the current position and then open StreamReader again and start reading from that poistion ?

If not what else can I use to accomplish the same case without locking the file ?

I tried this but it doesn't work:

var fs = File.Open(@ "C:\testfile.txt", FileMode.Open, FileAccess.Read);
var sr = new StreamReader(fs);

Debug.WriteLine(sr.ReadLine()); //Prints:firstline

var pos = fs.Position;

while (!sr.EndOfStream) 
{
    Debug.WriteLine(sr.ReadLine());
}

fs.Seek(pos, SeekOrigin.Begin);

Debug.WriteLine(sr.ReadLine());
//Prints Nothing, i expect it to print SecondLine.

Here is the other code I also tried :

var position = -1;
StreamReaderSE sr = new StreamReaderSE(@ "c:\testfile.txt");

Debug.WriteLine(sr.ReadLine());
position = sr.BytesRead;

Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());

Debug.WriteLine("Wait");

sr.BaseStream.Seek(position, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
spaleet
  • 838
  • 2
  • 10
  • 23
Stacker
  • 8,157
  • 18
  • 73
  • 135

6 Answers6

37

I realize this is really belated, but I just stumbled onto this incredible flaw in StreamReader myself; the fact that you can't reliably seek when using StreamReader. Personally, my specific need is to have the ability to read characters, but then "back up" if a certain condition is met; it's a side effect of one of the file formats I'm parsing.

Using ReadLine() isn't an option because it's only useful in really trivial parsing jobs. I have to support configurable record/line delimiter sequences and support escape delimiter sequences. Also, I don't want to implement my own buffer so I can support "backing up" and escape sequences; that should be the StreamReader's job.

This method calculates the actual position in the underlying stream of bytes on-demand. It works for UTF8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, and any single-byte encoding (e.g. code pages 1252, 437, 28591, etc.), regardless the presence of a preamble/BOM. This version will not work for UTF-7, Shift-JIS, or other variable-byte encodings.

When I need to seek to an arbitrary position in the underlying stream, I directly set BaseStream.Position and then call DiscardBufferedData() to get StreamReader back in sync for the next Read()/Peek() call.

And a friendly reminder: don't arbitrarily set BaseStream.Position. If you bisect a character, you'll invalidate the next Read() and, for UTF-16/-32, you'll also invalidate the result of this method.

public static long GetActualPosition(StreamReader reader)
{
    System.Reflection.BindingFlags flags = System.Reflection.BindingFlags.DeclaredOnly | System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.GetField;

    // The current buffer of decoded characters
    char[] charBuffer = (char[])reader.GetType().InvokeMember("charBuffer", flags, null, reader, null);

    // The index of the next char to be read from charBuffer
    int charPos = (int)reader.GetType().InvokeMember("charPos", flags, null, reader, null);

    // The number of decoded chars presently used in charBuffer
    int charLen = (int)reader.GetType().InvokeMember("charLen", flags, null, reader, null);

    // The current buffer of read bytes (byteBuffer.Length = 1024; this is critical).
    byte[] byteBuffer = (byte[])reader.GetType().InvokeMember("byteBuffer", flags, null, reader, null);

    // The number of bytes read while advancing reader.BaseStream.Position to (re)fill charBuffer
    int byteLen = (int)reader.GetType().InvokeMember("byteLen", flags, null, reader, null);

    // The number of bytes the remaining chars use in the original encoding.
    int numBytesLeft = reader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);

    // For variable-byte encodings, deal with partial chars at the end of the buffer
    int numFragments = 0;
    if (byteLen > 0 && !reader.CurrentEncoding.IsSingleByte)
    {
        if (reader.CurrentEncoding.CodePage == 65001) // UTF-8
        {
            byte byteCountMask = 0;
            while ((byteBuffer[byteLen - numFragments - 1] >> 6) == 2) // if the byte is "10xx xxxx", it's a continuation-byte
                byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
            if ((byteBuffer[byteLen - numFragments - 1] >> 6) == 3) // if the byte is "11xx xxxx", it starts a multi-byte char.
                byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
            // see if we found as many bytes as the leading-byte says to expect
            if (numFragments > 1 && ((byteBuffer[byteLen - numFragments] >> 7 - numFragments) == byteCountMask))
                numFragments = 0; // no partial-char in the byte-buffer to account for
        }
        else if (reader.CurrentEncoding.CodePage == 1200) // UTF-16LE
        {
            if (byteBuffer[byteLen - 1] >= 0xd8) // high-surrogate
                numFragments = 2; // account for the partial character
        }
        else if (reader.CurrentEncoding.CodePage == 1201) // UTF-16BE
        {
            if (byteBuffer[byteLen - 2] >= 0xd8) // high-surrogate
                numFragments = 2; // account for the partial character
        }
    }
    return reader.BaseStream.Position - numBytesLeft - numFragments;
}

Of course, this uses Reflection to get at private variables, so there is risk involved. However, this method works with .Net 2.0, 3.0, 3.5, 4.0, 4.0.3, 4.5, 4.5.1, 4.5.2, 4.6, and 4.6.1. Beyond that risk, the only other critical assumption is that the underlying byte-buffer is a byte[1024]; if Microsoft changes it the wrong way, the method breaks for UTF-16/-32.

This has been tested against a UTF-8 file filled with Ažテ (10 bytes: 0x41 C5 BE E3 83 86 F0 A3 98 BA) and a UTF-16 file filled with A (6 bytes: 0x41 00 01 D8 37 DC). The point being to force-fragment characters along the byte[1024] boundaries, all the different ways they could be.

UPDATE (2013-07-03): I fixed the method, which originally used the broken code from that other answer. This version has been tested against data containing a characters requiring use of surrogate pairs. The data was put into 3 files, each with a different encoding; one UTF-8, one UTF-16LE, and one UTF-16BE.

UPDATE (2016-02): The only correct way to handle bisected characters is to directly interpret the underlying bytes. UTF-8 is properly handled, and UTF-16/-32 work (given the length of byteBuffer).

Granger
  • 3,639
  • 4
  • 36
  • 34
  • I love you for this. I have been running into all kinds of weird issues trying to reverse the position of a stream reader all day and this fixed it in one! – Dean North Jun 26 '14 at 14:50
  • Unfortunately, I hit some cases where this implementation does not work (UTF-8 with mix of English and Japanese characters). I had to revert to keeping track of my own position using `CurrentEncoding.GetByteCount()` on each `ReadLine()`. – Matt Houser Feb 10 '16 at 07:38
  • @MattHouser Can you elaborate? I tested with such a mixture of characters (including surrogates). – Granger Feb 11 '16 at 20:34
  • I forgot: Thanks to Matt Houser for helping me track down the issues; his sample data and time was extremely helpful! – Granger Mar 25 '16 at 19:18
  • Hi @Granger, I tried your code and it fails at the second instruction with error `System.MissingFieldException has been thrown. Cannot find variable charBuffer`. I'm using a standard StreamReader instance like `var s = new StreamReader("blah.txt");` and this error comes up. Do you know what's wrong? – Yeehaw Apr 21 '17 at 15:56
  • @Yeehaw - It sounds like you're not targeting any of the .Net framework versions I tested against. You'll likely need to use ILSpy to see what the internal variable names changed to. – Granger Apr 21 '17 at 20:20
  • I was able to get @Lasse Espeholt 's solution to work in the end, but I'd still be curious as to why your code failed for me. When I have some spare time, I'll try to investigate a little further. Thanks anyway for now @Granger! – Yeehaw Apr 24 '17 at 19:11
  • Would it have been easier to wrap StreamReader and have ReadLine persist each lines start position? Then you could SeekLine(5); – N-ate Apr 05 '19 at 18:23
  • @N-ate: If you're doing trivial parsing, of course. As long as your parsing logic never needs to look beyond what ReadLine() thinks is a "line", you shouldn't end up with something even more complicated. – Granger Apr 06 '19 at 15:13
  • **Warning** for others working off this implementation. Make sure the `throwOnInvalidBytes` option is enabled on your Encoding instance. Otherwise, the decoder will insert fallback characters into the character buffer when it encounters an invalid byte sequence. These fallback characters don't necessarily take up the same number of bytes as the invalid sequence that it is replacing. The result is that `CurrentEncoding.GetByteCount` on the unused portion of the character buffer **will not** represent the offset between `BaseStream.Position` and the location of the last character you read. – mbaker3 Jun 14 '22 at 02:29
  • Is that due to setting `Position` to something arbitrary, or due to a sequence of source bytes that are invalid for the encoding in use? Either way, good advice. – Granger Jun 15 '22 at 03:36
  • The case I encountered was due to a sequence of source bytes that don't represent any UTF-8 character. I'm not sure whether setting `Position` on the low surrogate byte will cause a similar (or different issue). – mbaker3 Jun 17 '22 at 16:40
  • This should be accepted answer, as the best way to find current stream position (seeking is easy when you have the correct byte position). It even works for netcore/net5-net7, with few little changes. `ThrowOnInvalidBytes` is now replaced with passing `new DecoderExceptionFallback()` to `Encoding.GetEncoding(...)`. And you have to update reflected names, they now have '_' in front. – huancz Aug 02 '23 at 10:08
17

Yes you can, see this:

var sr = new StreamReader("test.txt");
sr.BaseStream.Seek(2, SeekOrigin.Begin); // Check sr.BaseStream.CanSeek first

Update: Be aware that you can't necessarily use sr.BaseStream.Position to anything useful because StreamReader uses buffers so it will not reflect what you actually have read. I guess you gonna have problems finding the true position. Because you can't just count characters (different encodings and therefore character lengths). I think the best way is to work with FileStream´s themselves.

Update: Use the TGREER.myStreamReader from here: http://www.daniweb.com/software-development/csharp/threads/35078 this class adds BytesRead etc. (works with ReadLine() but apparently not with other reads methods) and then you can do like this:

File.WriteAllText("test.txt", "1234\n56789");

long position = -1;

using (var sr = new myStreamReader("test.txt"))
{
    Console.WriteLine(sr.ReadLine());

    position = sr.BytesRead;
}

Console.WriteLine("Wait");

using (var sr = new myStreamReader("test.txt"))
{
    sr.BaseStream.Seek(position, SeekOrigin.Begin);
    Console.WriteLine(sr.ReadToEnd());
}
Lasse Espeholt
  • 17,622
  • 5
  • 63
  • 99
  • seems ok but does it lock the file ? – Stacker Mar 23 '11 at 11:13
  • 1
    You can choose :) see the accepted answer here: http://stackoverflow.com/questions/1606349/does-a-streamreader-lock-a-text-file-whilst-it-is-in-use-can-i-prevent-this – Lasse Espeholt Mar 23 '11 at 11:16
  • that wouldnt help me save the position , please check the update in my question. – Stacker Mar 23 '11 at 11:38
  • @Stacker May I see your code? It works perfectly here and outputs "1234 wait 56789" – Lasse Espeholt Mar 23 '11 at 12:45
  • i will edit my question with the code , but one more thing i cant believe the class doesnt have EndOfStream anymore ! – Stacker Mar 23 '11 at 13:00
  • any way he stated that readline would return null if the end of stream which can be used instead of endofstream – Stacker Mar 23 '11 at 13:17
  • @lasseespeholt: how ever this wouldnt solve my problem cause the extended streamreader lock the file for reading , i tried to edit the file while its reading from it and notepad couldnt save changes – Stacker Mar 23 '11 at 13:23
  • i changed it so it doesnt lock the file anymore but still it doesnt behave as expected – Stacker Mar 23 '11 at 14:06
  • @Stacker Instead of giving a path with the `StreamReader` you should send a `FileStream` with it. Like `new FileStream (openFileDialog1.FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)` where you explicit set the `FileShare` mode. – Lasse Espeholt Mar 23 '11 at 14:08
  • Okay, try to create the `StreamReader` twice as in my example :) it should work ;) You can basically just take my code and send the not-locking filestream with in. – Lasse Espeholt Mar 23 '11 at 14:17
  • i edited the actual class to always allow write and read share – Stacker Mar 23 '11 at 15:19
  • There's an update of solution at http://www.daniweb.com/software-development/csharp/threads/35078/streamreader-and-position#post1715941 – marisks Aug 08 '12 at 06:09
  • 2
    @marisks : the solution presented on daniweb doesn't handle multi-bytes UTF8 characters, as it's counting the number of *characters* read to update the position. – Zonko Dec 26 '12 at 16:12
2

If you want to just search for a start position within a text stream, I added this extension to StreamReader so that I could determine where the edit of the stream should occur. Granted, this is based upon characters as the incrementing aspect of the logic, but for my purposes, it works great, for getting the position within a text/ASCII based file based upon a string pattern. Then, you can use that location as a start point for reading, to write a new file that discludes the data prior to the start point.

The returned position within the stream can be provided to Seek to start from that position within text-based stream reads. It works. I've tested it. However, there may be issues when matching to non-ASCII Unicode chars during the matching algorithm. This was based upon American English and the associated character page.

Basics: it scans through a text stream, character-by-character, looking for the sequential string pattern (that matches the string parameter) forward only through the stream. Once the pattern doesn't match the string parameter (i.e. going forward, char by char), then it will start over (from the current position) trying to get a match, char-by-char. It will eventually quit if the match can't be found in the stream. If the match is found, then it returns the current "character" position within the stream, not the StreamReader.BaseStream.Position, as that position is ahead, based on the buffering that the StreamReader does.

As indicated in the comments, this method WILL affect the position of the StreamReader, and it will be set back to the beginning (0) at the end of the method. StreamReader.BaseStream.Seek should be used to run to the position returned by this extension.

Note: the position returned by this extension will also work with BinaryReader.Seek as a start position when working with text files. I actually used this logic for that purpose to rewrite a PostScript file back to disk, after discarding the PJL header information to make the file a "proper" PostScript readable file that could be consumed by GhostScript. :)

The string to search for within the PostScript (after the PJL header) is: "%!PS-", which is followed by "Adobe" and the version.

public static class StreamReaderExtension
{
    /// <summary>
    /// Searches from the beginning of the stream for the indicated
    /// <paramref name="pattern"/>. Once found, returns the position within the stream
    /// that the pattern begins at.
    /// </summary>
    /// <param name="pattern">The <c>string</c> pattern to search for in the stream.</param>
    /// <returns>If <paramref name="pattern"/> is found in the stream, then the start position
    /// within the stream of the pattern; otherwise, -1.</returns>
    /// <remarks>Please note: this method will change the current stream position of this instance of
    /// <see cref="System.IO.StreamReader"/>. When it completes, the position of the reader will
    /// be set to 0.</remarks>
    public static long FindSeekPosition(this StreamReader reader, string pattern)
    {
        if (!string.IsNullOrEmpty(pattern) && reader.BaseStream.CanSeek)
        {
            try
            {
                reader.BaseStream.Position = 0;
                reader.DiscardBufferedData();
                StringBuilder buff = new StringBuilder();
                long start = 0;
                long charCount = 0;
                List<char> matches = new List<char>(pattern.ToCharArray());
                bool startFound = false;

                while (!reader.EndOfStream)
                {
                    char chr = (char)reader.Read();

                    if (chr == matches[0] && !startFound)
                    {
                        startFound = true;
                        start = charCount;
                    }

                    if (startFound && matches.Contains(chr))
                    {
                        buff.Append(chr);

                        if (buff.Length == pattern.Length
                            && buff.ToString() == pattern)
                        {
                            return start;
                        }

                        bool reset = false;

                        if (buff.Length > pattern.Length)
                        {
                            reset = true;
                        }
                        else
                        {
                            string subStr = pattern.Substring(0, buff.Length);

                            if (buff.ToString() != subStr)
                            {
                                reset = true;
                            }
                        }

                        if (reset)
                        {
                            buff.Length = 0;
                            startFound = false;
                            start = 0;
                        }
                    }

                    charCount++;
                }
            }
            finally
            {
                reader.BaseStream.Position = 0;
                reader.DiscardBufferedData();
            }
        }

        return -1;
    }
}
0

FileStream.Position (or equivalently, StreamReader.BaseStream.Position) will usually be ahead -- possibly way ahead -- of the TextReader position because of the underlying buffering taking place.

If you can determine how newlines are handled in your text files, you can add up the number of bytes read based on line lengths and end-of-line characters.

File.WriteAllText("test.txt", "1234" + System.Environment.NewLine + "56789");

long position = -1;
long bytesRead = 0;
int newLineBytes = System.Environment.NewLine.Length;

using (var sr = new StreamReader("test.txt"))
{
    string line = sr.ReadLine();
    bytesRead += line.Length + newLineBytes;

    Console.WriteLine(line);

    position = bytesRead;
}

Console.WriteLine("Wait");

using (var sr = new StreamReader("test.txt"))
{
    sr.BaseStream.Seek(position, SeekOrigin.Begin);
    Console.WriteLine(sr.ReadToEnd());
}

For more complex text file encodings you might need to get fancier than this, but it worked for me.

yoyo
  • 8,310
  • 4
  • 56
  • 50
  • Is there a particular reason to initialize position with -1? – SKull Sep 01 '17 at 15:10
  • 1
    The String.Length method returns number of characters not number of bytes. Accordingly any multi-byte characters are not accounted for, so this code is highly specific at best, dangerous at worst. See the [MSDN documentation for this method](https://learn.microsoft.com/en-us/dotnet/api/system.string.length?view=netframework-4.8) – Moog Sep 27 '19 at 18:05
0

From MSDN:

StreamReader is designed for character input in a particular encoding, whereas the Stream class is designed for byte input and output. Use StreamReader for reading lines of information from a standard text file.

In most of the examples involving StreamReader, you will see reading line by line using the ReadLine(). The Seek method comes from Stream class which is basically used to read or handle data in bytes.

Abdel Raoof Olakara
  • 19,223
  • 11
  • 88
  • 133
  • 3
    Marked down because OP is talking about seeking while using a StreamReader. This answer doesn't address seeking and regurgitates the MSDN definition which isn't useful. – enorl76 May 30 '18 at 19:24
0

I found the suggestions above to not work for me -- my use case was to simply need to back up one read position (I'm reading one char at a time with a default encoding). My simple solution was inspired by above commentary ... your mileage may vary...

I saved the BaseStream.Position before reading, then determined if I needed to back up... if yes, then set position and invoke DiscardBufferedData().