2

Like in this question I am trying to record the exact position when parsing XML.

I already use the SAX Locator passed to setDocumentLocator() to record the line and column number but that doesn't give the offset from the beginning of the file. Is there a way to find the number of bytes read so far by the SAX parser or offset of each line without re-reading the whole file?

Community
  • 1
  • 1
clockworkgeek
  • 37,650
  • 9
  • 89
  • 127

2 Answers2

1

Hypothetically, you can use the CountingInputStream from Apache commons IO.

Dmitry Negoda
  • 2,894
  • 2
  • 18
  • 15
1

I found another question and answer which suggests using an XMLStreamReader instead of SAXParser because it has getLocation().getCharacterOffset() instead. It already has exactly what I need.

Community
  • 1
  • 1
clockworkgeek
  • 37,650
  • 9
  • 89
  • 127
  • This is not correct.This way you get CHARACTER offset not BYTE offset. If your XML file contains at least one double byte character then you are in big trouble. – Karol Król Nov 21 '14 at 15:09
  • Please consider to take a look at this question http://stackoverflow.com/questions/43366566 – jschnasse May 16 '17 at 15:14