-1

I've been trying numerous issues here in stack.overflow to remove the last blank lines from the 2.txt file (input):

2.txt file:

-11
B1
5
B1
-2
B1
7
B1
-11
B1
9
B1
-1
B1
-3
B1
19
B1
-22
B1
2
B1
1
B1
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

and the only one that worked using print(line) was this https://stackoverflow.com/a/6745865/10824251. But when I try to use f.write(line) rather than print(line) in my final 2.txt file (output) is as shown below:

2.txt file final:

-11B15B1-2B17B1-11B19B1-1B1-3B119B1-22B12B11B118B1-14B10B111B1-8B1-15
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

However, when I use the code using print line) instead of f.write (line), my bash terminal displays output with the last lines deleted (see print(line) result in terminal bash below) but with deformation equal to 2.txt file final, ie it works correctly. I have tried to understand what is happening but have not made any progress.

print(line) resut in terminal bash

-11B15B1-2B17B1-11B19B1-1B1-3B119B1-22B12B11B118B1-14B10B111B1-8B1-15
18
B1
-14
B1
0
B1
11
B1
-8
B1
-15

UPDATE:

My script eliminating the last lines of 2.txt file but deforming the first lines of in the terminal bash:

for line in open('2.txt'):
  line = line.rstrip()
  if line != '':
    print (line)

My script deforming the first lines of 2.txt fileand also does not delete the last lines as desired in file output 3.txt:

with open("2.txt",'r+') as f:
  for line in open('3.txt'):
    line = line.rstrip()
    if line != '':
        f.write(line)
7beggars_nnnnm
  • 697
  • 3
  • 12
  • @CharlesDuffy ThankX for your help but I just cited the print produced on the bash terminal to cast doubt on how the code works. Anyway what I want is to produce the changes inside the file but using the python script – 7beggars_nnnnm Jan 08 '20 at 21:54
  • 1
    ...for that matter, instead of the hexdump output, you could give us the output of `print(repr(open('2.txt', 'rb').read()))`. – Charles Duffy Jan 08 '20 at 21:55
  • 1
    Gotcha. So, `f.write(line)` doesn't write any newlines unless `line` contains them, so it generating one big line when you call it over and over is normal/expected. – Charles Duffy Jan 08 '20 at 21:55
  • @CharlesDuffy I added an UPDATE to better understand the dynamics of the problem. – 7beggars_nnnnm Jan 08 '20 at 22:06
  • 1
    If you don't see my answer (which takes into account your update), consider reloading the page. – Charles Duffy Jan 08 '20 at 22:06
  • 1
    ...one thing I could believe is that you don't actually have newlines in your file, but instead other ANSI cursor-control codes; `strip()` won't remove them (because they're not empty characters within the definition thereof), and it explains why only some but not all of your original file was corrupted. – Charles Duffy Jan 08 '20 at 22:40
  • 1
    ...anyhow, the content anomalies will show up in `hexdump -C`, which I was encouraging you to use regardless. :) – Charles Duffy Jan 08 '20 at 22:41
  • @CharlesDuffy You're right, here https://imgur.com/TyRryL4 my `2.txt` file before using your solution, see the extra line at the end of the file text. Using your solution and looking at `3.txt` file, look here https://imgur.com/omLDb4M, really the extra line goes away. ThankXD! – 7beggars_nnnnm Jan 08 '20 at 22:49

1 Answers1

1

Fixing the existing approach

rstrip() removes the trailing newline in addition to other content, so when you write the result, it leaves the cursor on the end of the same line.

One way to fix it that's clear about what needs to change (all code unmodified but for addition of the last line):

with open("2.txt",'r+') as f:
  for line in open('3.txt'):
    line = line.rstrip()
    if line != '':
        f.write(line)
        f.write(os.linesep)  # one extra line

Alternately, you could change f.write(line) to print(line, file=f).


Optimizing to run quickly on huge files

If you need to trim a small number of blank lines from the end of an arbitrarily-large file, it makes sense to skip to the end of that file and work backwards; that way, you don't care how large the whole file is, but only how much content needs to be removed.

That is, something like:

import os, sys
block_size = 4096 # 4kb blocks; decent chance this is your page size & disk sector size.
filename = sys.argv[1] # or replace this with a hardcoded name if you prefer

with open(filename, 'r+b') as f:   # seeking backwards only supported on files opened binary
    while True:
        f.seek(0, 2)                            # start at the end of the file
        offset = f.tell()                       # figure out where that is
        f.seek(max(0, offset - block_size), 0)  # move up to block_size bytes back
        offset = f.tell()                       # figure out where we are
        trailing_content = f.read()             # read from here to the end
        new_content = trailing_content.rstrip() # remove all whitespace
        if new_content == trailing_content:     # nothing to remove?
            break                               # then we're done.
        if(new_content != ''):                  # and if post-strip there's content...
            f.seek(offset + len(new_content))   # jump to its end...
            f.write(os.linesep.encode('utf-8')) # ...write a newline...
            f.truncate()                        # and then delete the rest of the file.
            break
        else:
            f.seek(offset, 0)                   # go to where our block started
            f.truncate()                        # and delete *everything* after it
            # run through the loop again, to see if there's still more trailing whitespace.
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thanks but in **Fixing the existing approach** provided `3.txt` without deforming the first few lines but still has empty lines at the end. Meanwhile when using **Optimizing to run quickly on huge files** I get the following error `filename = sys.argv [1] IndexError: list index out of range...` – 7beggars_nnnnm Jan 08 '20 at 22:30
  • 1
    re: "fixing the existing approach", those lines are clearly not truly empty. And re: "optimizing", as should be obvious, it's expecting you to provide a filename. – Charles Duffy Jan 08 '20 at 22:36
  • 1
    anyhow, whatever kind of not-truly-empty content you've got in your file, nobody is going to be able to test an answer unless you give us instructions to create a file that the bug reproduces with. (If you copy-and-paste from your real file to the question, test that it still reproduces when then copy-and-pasting from the website to a new file and running your code against the new file instead of the original). – Charles Duffy Jan 08 '20 at 22:38