I am writing a program to parse the IETF Internet-drafts and pull out such things as title, date, protocol, and the countries of the authors. I realize this has been done before (arkko.com), but it's a little self-imposed programming exercise.
The problem I'm having is this:
Using some logic, some basic parsing, and
position = doc.tell()
I have precisely identified the point in each document where I need to begin examining lines and looking for, identifying, and pulling out the authors' countries of origin. And I can get to that precise point with:
doc.seek(position)
The problem I'm having is...then what? Having gotten to that position, I've tried every combination of file and string methods that I know to start parsing an arbitrary number of following lines, but I cannot make it work.
Sorry I don't have any full code snippets, but I've tried way too many and I think I might be barking up the entirely wrong tree at this point.
Edit: Actually I came up with a fairly simple solution:
I went through the file once, counted lines, and noted the line number of where I needed to begin parsing.
Then I went through the file again counting lines, and when the line numbers were greater than the first line number, I began parsing.
Probably not the most elegant solution in that I think I should have been able to use doc.seek() to avoid a second count, but it works. And now I know an area of string and file manipulation I need to explore a bit more.