0

I am currently appending like this:

with open(filename, "a") as fh:
    fh.write("some line")

This assumes that the last line of the file is a new line and that it is empty

in the case where the file doesn't end with a new line, my code appends to the last line

i.e.

last line of textsome line

and not:

last line of text
some line

in the case where the file ends with many "empty" new lines, the line is added after them , which isn't good either.

i.e.

...
...
last line
\n
\n
\n
...
some line

I want my line to be added on a new line after the last meaningful existing line (line with text that is non-empty)

The text files are quite large, 10-20 gigs each , so I can't easily "read" and then "chomp".

I know that the files won't have more then 1-10 empty lines at the end (usually only a single empty line at the end hence the code usually works and rarely fails)

is there a variation on open (file, 'a') to seek to end of content of file on a new line?

Avba
  • 14,822
  • 20
  • 92
  • 192
  • Merely giving a thought. I believe the fd released by open contains seek and has a method to traverse to the end of the file, if not, look at buffer stream? Anyhow, traverse to the end of the file, with the seek function and scan backwards till text. I'm not sure of any time complexities of seek but I've used it in buffer streams and it's been quite fast. – Kyle Jul 01 '19 at 14:08
  • You can consult https://stackoverflow.com/questions/18857352/python-remove-very-last-character-in-file for the idea of deleting last unwanted symbols in file – Evgeny Jul 01 '19 at 14:12
  • Open the file with *r+b*, *seek* at the end minus some buffer size, read the size chars in the buffer, scan it backwards until you either find a regular char (possibly followed by an *EOLN*), and write your line (possibly preceded by an *EOLN*, and) followed by an *EOLN* - to make things easier at future appends. Truncate the file if necessary. – CristiFati Jul 01 '19 at 14:18
  • @CristiFati: The cost of `mmap`ing is typically trivial; as long as you're on a 64 bit OS, all it really costs is virtual address space. If you know, without a doubt, that the end of the file is close to the truncation point you want (within a couple pages) it's easy to adapt [the bottom example of this answer](https://stackoverflow.com/a/33811809/364696) to only map the two pages (one full page, plus the trailing partial page), which avoids mapping the whole file when you're sure you only need a few hundred bytes at the end, removing the (small) incremental costs of mapping the whole file. – ShadowRanger Jul 01 '19 at 14:26

0 Answers0