28

Is there a way to get the current position in the stream of the node under examination by the XmlReader?

I'd like to use the XmlReader to parse a document and save the position of certain elements so that I can seek to them later.

Addendum:

I'm getting Xaml generated by a WPF control. The Xaml should not change frequently. There are placeholders in the Xaml where I need to replace items, sometimes looping. I thought it might be easier to do in code rather than a transform (I might be wrong about this). My idea was to parse it to a simple data structure of what needs to be replace and where it is, then use a StringBuilder to produce the final output by copying chunks from the xaml string.

dmo
  • 3,993
  • 6
  • 35
  • 39
  • 1
    Did you have any success with this? I want to do something similar (calculate stream offsets of certain elements, then later seek to that offset and parse from there), and the best I can see so far is to parse the file twice to calculate the offsets. – Rob Jul 17 '13 at 13:38
  • @Rob Nope. I ended up using XmlReader/XmlWriter to process the document each time. They are fast enough for my purpose that I can give up this optimization. – dmo Jul 22 '13 at 16:43

6 Answers6

13

As Jon Skeet says, XmlTextReader implements IXmlLineInfo but XmlTextReader was deprecated since .NET 2.0 and the question is about XmlReader only. I found this solution:

XmlReader xr = XmlReader.Create( // MSDN recommends to use Create() instead of ctor()
    new StringReader("<some><xml><string><data>"),
    someSettings // furthermore, can't set XmlSettings on XmlTextReader
);
IXmlLineInfo xli = (IXmlLineInfo)xr;

while (xr.Read())
{
    // ... some read actions ...

    // current position in StringReader can be accessed through
    int line = xli.LineNumber;
    int pos  = xli.LinePosition;
}

P.S. Tested for .NET Compact Framework 3.5, but should work for others too.

RollerKostr
  • 331
  • 3
  • 7
  • 1
    perfect solution; the others seem very convoluted. maybe back in 2010, this wasn't available?? – Richard II Jan 14 '16 at 21:51
  • 1
    Needed to know the byte position with the overall file, this worked well for me because the GB file is one line long for my problem. – Peter Dec 13 '18 at 12:38
10

Just to head off one suggestion before it's made: you could keep a reference to the underlying stream you pass into XmlReader, and make a note of its position - but that will give you the wrong results, as the reader will almost certainly be buffering its input (i.e. it'll read the first 1024 characters or whatever - so your first node might "appear" to be at character 1024).

If you use XmlTextReader instead of just XmlReader, then that implements IXmlLineInfo, which means you can ask for the LineNumber and LinePosition at any time - is that good enough for you? (You should probably check HasLineInfo() first, admittedly.)

EDIT: I've just noticed that you want to be able to seek to that position later... in that case line information may not be terribly helpful. It's great for finding something in a text editor, but not so great for moving a file pointer. Could you give some more information about what you're trying to do? There may be a better way of approaching the problem.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • For my prototype I was writing the xml to a writer, flushing that, and getting the length from that stream. It worked okay, but I want to move to something cleaner & less memory intensive. Thanks for the pointer. – dmo Jan 29 '10 at 07:49
  • It looks like XmlTextReader implements IXmlLineInfo. – dmo Jan 29 '10 at 07:51
  • @JonSkeet: I have now the same problem, I have an `XMLReader` object which is built of a `Stream` object. I want to be able to save current location at the reader, and then iterate some elements and then to restore it's position. So ,I was trying to save the stream.Position and to set it back later after working on it through the XMlReader object but it doesn't work. Do you have any ideas? – JavaSa Dec 19 '12 at 20:28
  • @JavaSa: It sounds like you should ask a new question with a short but complete example. – Jon Skeet Dec 19 '12 at 21:56
  • @JonSkeet: I had an idea to solve it by cloning stream and working on two different streams. I will glad if you will share with me your thoughts. The question: http://stackoverflow.com/questions/13962015/how-can-i-clone-memorystream-object – JavaSa Dec 19 '12 at 22:22
8

I have worked on a solution for this, and while it may not work in every scenario and uses reflection against private members of .NET Framework classes, I am able to calculate the correct position of the XmlReader with the extension method shown below.

Your XmlReader must be created from a StreamReader using an underlying FileStream (I haven't tried other Streams, and they may work as well so long as they report their position).

I've posted details here: http://g-m-a-c.blogspot.com/2013/11/determine-exact-position-of-xmlreader.html

public static class XmlReaderExtensions
{
    private const long DefaultStreamReaderBufferSize = 1024;

    public static long GetPosition(this XmlReader xr, StreamReader underlyingStreamReader)
    {
        // Get the position of the FileStream
        long fileStreamPos = underlyingStreamReader.BaseStream.Position;

        // Get current XmlReader state
        long xmlReaderBufferLength = GetXmlReaderBufferLength(xr);
        long xmlReaderBufferPos = GetXmlReaderBufferPosition(xr);

        // Get current StreamReader state
        long streamReaderBufferLength = GetStreamReaderBufferLength(underlyingStreamReader);
        int streamReaderBufferPos = GetStreamReaderBufferPos(underlyingStreamReader);
        long preambleSize = GetStreamReaderPreambleSize(underlyingStreamReader);

        // Calculate the actual file position
        long pos = fileStreamPos 
            - (streamReaderBufferLength == DefaultStreamReaderBufferSize ? DefaultStreamReaderBufferSize : 0) 
            - xmlReaderBufferLength 
            + xmlReaderBufferPos + streamReaderBufferPos - preambleSize;

        return pos;
    }

    #region Supporting methods

    private static PropertyInfo _xmlReaderBufferSizeProperty;

    private static long GetXmlReaderBufferLength(XmlReader xr)
    {
        if (_xmlReaderBufferSizeProperty == null)
        {
            _xmlReaderBufferSizeProperty = xr.GetType()
                                             .GetProperty("DtdParserProxy_ParsingBufferLength",
                                                          BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _xmlReaderBufferSizeProperty.GetValue(xr);
    }

    private static PropertyInfo _xmlReaderBufferPositionProperty;

    private static int GetXmlReaderBufferPosition(XmlReader xr)
    {
        if (_xmlReaderBufferPositionProperty == null)
        {
            _xmlReaderBufferPositionProperty = xr.GetType()
                                                 .GetProperty("DtdParserProxy_CurrentPosition",
                                                              BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _xmlReaderBufferPositionProperty.GetValue(xr);
    }

    private static PropertyInfo _streamReaderPreambleProperty;

    private static long GetStreamReaderPreambleSize(StreamReader sr)
    {
        if (_streamReaderPreambleProperty == null)
        {
            _streamReaderPreambleProperty = sr.GetType()
                                              .GetProperty("Preamble_Prop",
                                                           BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return ((byte[]) _streamReaderPreambleProperty.GetValue(sr)).Length;
    }

    private static PropertyInfo _streamReaderByteLenProperty;

    private static long GetStreamReaderBufferLength(StreamReader sr)
    {
        if (_streamReaderByteLenProperty == null)
        {
            _streamReaderByteLenProperty = sr.GetType()
                                             .GetProperty("ByteLen_Prop",
                                                          BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _streamReaderByteLenProperty.GetValue(sr);
    }

    private static PropertyInfo _streamReaderBufferPositionProperty;

    private static int GetStreamReaderBufferPos(StreamReader sr)
    {
        if (_streamReaderBufferPositionProperty == null)
        {
            _streamReaderBufferPositionProperty = sr.GetType()
                                                    .GetProperty("CharPos_Prop",
                                                                 BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _streamReaderBufferPositionProperty.GetValue(sr);
    }

    #endregion
}
G-Mac
  • 1,173
  • 13
  • 10
  • Instead of using reflection for PreamableSize, you could do it in a more portable and future proof way: sr.CurrentEncoding.GetPreamble().Length. Also, if said method is going to return a long, might I recommend you return Array.LongLength or just stick with returning an int? – kornman00 Mar 11 '14 at 01:13
  • In fact, it appears that -all- of these properties are 32-bit ints in .NET – kornman00 Mar 11 '14 at 01:21
  • Regarding the bit where you check: (streamReaderBufferLength == DefaultStreamReaderBufferSize ? DefaultStreamReaderBufferSize : 0) - presumably this is because the streamreader misreports (as zero) the StreamReader buffer position for the last block of data? I found that checking sr.EndOfStream and, if true, setting the position to the length worked (essentially the same logic) - unusually though, actually retrieving the sr.EndOfStream property "fixed" the position reported through reflection. Ho hum. Awesome solution btw. – GHC Feb 05 '18 at 12:31
3

I have the same problem and apparently there is no simple solution.

So I decided to manipulate two read-only FileStream : one for the XmlReader, the other to get the position of each line :

private void ReadXmlWithLineOffset()
{
    string malformedXml = "<test>\n<test2>\r   <test3><test4>\r\n<test5>Thi is\r\ra\ntest</test5></test4></test3></test2>";
    string fileName = "test.xml";
    File.WriteAllText(fileName, malformedXml);

    XmlTextReader xr = new XmlTextReader(new FileStream(fileName, FileMode.Open, FileAccess.Read));
    FileStream fs2 = new FileStream(fileName, FileMode.Open, FileAccess.Read);

    try
    {
        int currentLine = 1;
        while(xr.Read())
        {
            if (!string.IsNullOrEmpty(xr.Name))
            {
                for (;currentLine < xr.LineNumber; currentLine++)
                    ReadLine(fs2);
                Console.WriteLine("{0} : LineNum={1}, FileOffset={2}", xr.Name, xr.LineNumber, fs2.Position);
            }
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine("Exception : " + ex.Message);
    }
    finally
    {
        xr.Close();
        fs2.Dispose();
    }
}

private void ReadLine(FileStream fs)
{
    int b;
    while ((b = fs.ReadByte()) >= 0)
    {
        if (b == 10) // \n
            return;
        if (b == 13) // \r
        {
            if (fs.ReadByte() != 10) // if not \r\n, go back one byte
                fs.Seek(-1, SeekOrigin.Current);
            return;
        }
    }            
}

This is not the best way of doing this because it uses two readers. To avoid this, we could rewrite a new FileReader shared between the XmlReader and the line counter. But it simply gives you the offset of the line you're interested in. To get the exact offset of the tag, we should use LinePosition, but this can be tricky because of the Encoding.

Etienne Coumont
  • 518
  • 7
  • 11
2

Thanks Geoff for the answer. It worked perfectly on windows 7. But somehow with .net 4 version on windows server 2003 of mscorlib.dll, i had to change following 2 functions to work.

private long GetStreamReaderBufferLength(StreamReader sr)
    {
        FieldInfo _streamReaderByteLenField = sr.GetType()
                                            .GetField("charLen",
                                                        BindingFlags.Instance | BindingFlags.NonPublic);

        var fValue = (int)_streamReaderByteLenField.GetValue(sr);

        return fValue;
    }

    private int GetStreamReaderBufferPos(StreamReader sr)
    {
        FieldInfo _streamReaderBufferPositionField = sr.GetType()
                                            .GetField("charPos",
                                                        BindingFlags.Instance | BindingFlags.NonPublic);
        int fvalue = (int)_streamReaderBufferPositionField.GetValue(sr);

        return fvalue;
    }

Also underlyingStreamReader in GetPosition method should be peek to advance the pointer.

private long GetPosition(XmlReader xr, StreamReader underlyingStreamReader)
    {
        long pos = -1;
        while (pos < 0)
        {
            // Get the position of the FileStream
             underlyingStreamReader.Peek();
            long fileStreamPos = underlyingStreamReader.BaseStream.Position;

            //            long fileStreamPos = GetStreamReaderBasePosition(underlyingStreamReader);
            // Get current XmlReader state
            long xmlReaderBufferLength = GetXmlReaderBufferLength(xr);
            long xmlReaderBufferPos = GetXmlReaderBufferPosition(xr);

            // Get current StreamReader state
            long streamReaderBufferLength = GetStreamReaderBufferLength(underlyingStreamReader);
            long streamReaderBufferPos = GetStreamReaderBufferPos(underlyingStreamReader);
            long preambleSize = GetStreamReaderPreambleSize(underlyingStreamReader);


            // Calculate the actual file position
            pos = fileStreamPos
                - (streamReaderBufferLength == DefaultStreamReaderBufferSize ? DefaultStreamReaderBufferSize : 0)
                - xmlReaderBufferLength
                + xmlReaderBufferPos + streamReaderBufferPos;// -preambleSize;
        }
        return pos;
    }
Vikas
  • 118
  • 1
  • 8
  • Good info. And there you have the perils of accessing private framework members on full display... :) – G-Mac Apr 12 '14 at 16:09
1
try
            {
                Auto[] Autobestand = new Auto[5];

                int Autoindex = 0; 

                OpenFileDialog ofd = new OpenFileDialog()
                {
                    Filter = "(*.xml)|*.xml",
                    Title = "Datei öffnen"
                };

                if (ofd.ShowDialog() == DialogResult.OK)
                {
                    XmlReader xr = new XmlTextReader(ofd.FileName);

                    while (xr.Read())
                    {
                        if (xr.NodeType == XmlNodeType.Element)
                        {
                            if (xr.Name == "Auto")
                            {
                                Autobestand[Autoindex] = new Auto(xr.ReadSubtree());

                                Autoindex++;
                            }
                        }
                    }
                    tb_Besitzer_1.Text = Autobestand[0].GetBestitzer();
                    tb_Besitzer_2.Text = Autobestand[1].GetBestitzer();
                    tb_Besitzer_3.Text = Autobestand[2].GetBestitzer();
                    tb_Besitzer_4.Text = Autobestand[3].GetBestitzer();

                    tb_Beschleunigung_1.Text = Autobestand[0].GetBeschleunigung();
                    tb_Beschleunigung_2.Text = Autobestand[1].GetBeschleunigung();
                    tb_Beschleunigung_3.Text = Autobestand[2].GetBeschleunigung();
                    tb_Beschleunigung_4.Text = Autobestand[3].GetBeschleunigung();

                    tb_Farbe_1.Text = Autobestand[0].GetFarbe();
                    tb_Farbe_2.Text = Autobestand[1].GetFarbe();
                    tb_Farbe_3.Text = Autobestand[2].GetFarbe();
                    tb_Farbe_4.Text = Autobestand[3].GetFarbe();

                    tb_Leistung_1.Text = Autobestand[0].GetLeistung();
                    tb_Leistung_2.Text = Autobestand[1].GetLeistung();
                    tb_Leistung_3.Text = Autobestand[2].GetLeistung();
                    tb_Leistung_4.Text = Autobestand[3].GetLeistung();

                    tb_Modell_1.Text = Autobestand[0].GetModell();
                    tb_Modell_2.Text = Autobestand[1].GetModell();
                    tb_Modell_3.Text = Autobestand[2].GetModell();
                    tb_Modell_4.Text = Autobestand[3].GetModell();

                }
            }
            catch (Exception ex)
            {
                MessageBox.Show("Fehler" + ex.Message);
            }