0

I am writing a program to parse the IETF Internet-drafts and pull out such things as title, date, protocol, and the countries of the authors. I realize this has been done before (arkko.com), but it's a little self-imposed programming exercise.

The problem I'm having is this:

Using some logic, some basic parsing, and

position = doc.tell()

I have precisely identified the point in each document where I need to begin examining lines and looking for, identifying, and pulling out the authors' countries of origin. And I can get to that precise point with:

doc.seek(position)

The problem I'm having is...then what? Having gotten to that position, I've tried every combination of file and string methods that I know to start parsing an arbitrary number of following lines, but I cannot make it work.

Sorry I don't have any full code snippets, but I've tried way too many and I think I might be barking up the entirely wrong tree at this point.

Edit: Actually I came up with a fairly simple solution:

I went through the file once, counted lines, and noted the line number of where I needed to begin parsing.

Then I went through the file again counting lines, and when the line numbers were greater than the first line number, I began parsing.

Probably not the most elegant solution in that I think I should have been able to use doc.seek() to avoid a second count, but it works. And now I know an area of string and file manipulation I need to explore a bit more.

rwjones
  • 377
  • 1
  • 5
  • 13
  • maybe this is of help: http://stackoverflow.com/questions/620367/python-how-to-jump-to-a-particular-line-in-a-huge-text-file – pypat Jun 03 '13 at 14:46
  • Please describe "every combination of file and string methods" you know and what doesn't work with them. –  Jun 03 '13 at 14:48

1 Answers1

1

You just need to call doc.read(some_buffer_length) and you'll get a string back.

How you deal with that string is a completely separate issue, but it doesn't matter if it comes from the beginning of the file, or not.

viraptor
  • 33,322
  • 10
  • 107
  • 191