2

I am tailing a file. I want to remove lines from the file while I am tailing it. I'd like to avoid overwriting the file (truncating) and I would like to avoid replacing the file with a new file, because this most likely messes up / corrupts the tail command results.

Currently I have tried two different ways of doing this:

  1. Read the whole contents of the file, remove the unwanted lines of data, then write back to the file with less data than before. This results in some stderr spewed from the tail command => "file was truncated"...tail -F is stilling working, but it does log this stderr.

  2. Use sed -i '/pattern/d' my-file.txt to delete lines from the file that I no longer want. This results in some stderr spewed from the tail command => "file was replaced" (note different than above)...tail -F is still working, but it does log this stderr.

I am wondering if there is a way to delete lines from a file without truncating the file or replacing the file, as this seems to make life a little bit harder for tail than otherwise.

Should I just ignore this stderr? If I just ignore the stderr, I think the tail results will just be inaccurate. I need the tail results to be as accurate as possible because they are feeding into a new program, not being read by a human.

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • 1
    Have you considered redirecting `stderr`, that is `tail -F path 2>/dev/null` ? – janos Dec 25 '16 at 10:36
  • I assume it is not possible to edit a file in place. – Cyrus Dec 25 '16 at 10:37
  • Well sed -i is sed --in-place, which seems like it's betraying the fact that the file is actually replaced with a new file, pretty lame IMO – Alexander Mills Dec 25 '16 at 19:05
  • Do you really need to edit the file or is it sufficient to not see the "bad lines" while tailing? – Andreas Wederbrand Dec 25 '16 at 21:39
  • @andreas, multiple processes will be tailing this single file, so this file needs to be a single and final source of truth. It would be better to just remove lines then to perhaps modify lines and mark them as "deleted" somehow. If you don't remove lines then the file night get too large. – Alexander Mills Dec 25 '16 at 22:46

1 Answers1

1

One workaround I'm seeing would be:

  • open the file in read/write
  • identify the line to be removed
  • instead of removing it, replace the previous linefeed+the contents of the line by space characters.

before replacement:

aaaaaaa\n
bbbbbbb\n
ccccccc\n

after replacement:

aaaaaaa        \n
ccccccc\n

Visually, the log has the line removed.

If you don't mind the extra spaces/can perform an off-line cleanup using sed 's/ *$//g', you're good, since opening in read-write does not change the location of unchanged data or the file node.

As a bonus, this is very fast, because even if the file is huge, you're just changing a few bytes, not rewriting the whole file.

I had a tough time writing this python implementation which works:

import re,os,sys
logfile = sys.argv[1]
regex = sys.argv[2]
replacement_char = " "  # default: space
if len(sys.argv)>3:
   replacement_char = sys.argv[3][0]  # first char of 3rd arg

pattern = re.compile(regex)

with open(logfile,"r+") as f:
    while True:
        old_offset = f.tell()
        l = f.readline()
        if not l:
            break
        if pattern.search(l):
            # match: blank the line
            new_offset = f.tell()
            if old_offset > len(os.linesep):
                old_offset-=len(os.linesep)
            f.seek(old_offset)
            f.write(replacement_char*(new_offset-old_offset-len(os.linesep)))

How to use:

blank.py logfile regex <optional replacement char>

How it works:

  • opens the file in read/write mode
  • loops on the lines
  • stores the current file offset
  • read a line
  • if matches the regex, get current offset, rewind to previous file offset and writes the appropriate number of blanks/replacement chars specified, removing the previous linefeed so the blanks are after a valid line, so visually it's the same as if the line were removed.
  • since the file is open in read/write mode, an external program reading it (on Linux) won't notice the changes since size & inode doesn't change: no more warnings from tee

Since it overwrites the previous linefeed, it just adds blanks/replacement chars to the previous line.

The only problem as you already noted is that if the first line matches, then it puts replacement chars in it. It's the only time it is visible. As a workaround, you could start your logfiles with a special, non-matchable header.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • I like this, I hope it works! Could you please quote the sed command verbatim that you suggest will actually edit the file in place? – Alexander Mills Dec 25 '16 at 19:10
  • The nice thing about this is that the file has the same amount of characters and lines, just now some are whitespace. This will make life a lot easier for the tail command! – Alexander Mills Dec 25 '16 at 19:11
  • The only problem I see with this, is that I would also like to the use the head command to read the first few lines from the file. If I have a lot of whitespace, then the first few lines might be empty. – Alexander Mills Dec 25 '16 at 20:23
  • @JF, I still don't know what command to use to do: "instead of removing it, replace the previous linefeed+the contents of the line by space characters." I tried this answer http://stackoverflow.com/questions/11245144/replace-whole-line-containing-a-string-using-sed and that also just replaces the file with a new file, so that doesn't work for me so well. – Alexander Mills Dec 25 '16 at 20:38
  • thanks, my guess is however, that when you write back to the file, it will truncate the whole thing and overwrite with the new results. – Alexander Mills Dec 25 '16 at 20:44
  • thanks! you may wish to use this as inspiration http://unix.stackexchange.com/questions/11067/is-there-a-way-to-modify-a-file-in-place – Alexander Mills Dec 25 '16 at 20:48
  • nice link. Comforted me in the idea that bash wasn't good enough for that task. Edited with my python solution. Difficult to tune but now works fine. – Jean-François Fabre Dec 25 '16 at 21:21
  • thanks, looking at the code now - what I am looking for is a way to replace text in the file that matches a pattern with some new characters. Your program just takes two arguments, the file and the pattern to match. What does your program do when a text sequence matches the pattern if there is not 3rd argument representing what to replace the matching text with? confused – Alexander Mills Dec 25 '16 at 21:24
  • Ok sorry, I think you've satisfied the original question I think, but I think a more general solution would be to pass in the data to replace the matching characters with? – Alexander Mills Dec 25 '16 at 21:27
  • Thanks! Let me test it out, if it works reasonably well then I accept the answer – Alexander Mills Dec 25 '16 at 22:43
  • Ok sorry, I think you've satisfied the original question I think, but I think a more general solution would be to pass in the data to replace the matching characters with? – Alexander Mills Dec 26 '16 at 04:39
  • tested this out, seems to work, if you could add any more details as to why/how it works, that would be great. By work I mean doesn't seem to truncate/replace the file. Pretty crazy that this is not standard stuff (not part of a standard lib somewhere). – Alexander Mills Dec 26 '16 at 04:47
  • ok I have added more info. I don't know what to add next, it seems pretty much detailed. The answer to "why ins't it standard" is unknown, but same problem when you want to handle binary files using bash commands. bash commands are limited to text manipulation and make extensive use of pipe mechanism. They don't tend to change stuff "in-place", and when they do, they use temp files to avoid losing the data (and read/write mode on text file usually makes no sense) – Jean-François Fabre Dec 26 '16 at 09:45
  • thanks @JF - so I re-wrote your algorithm in Node.js and I noticed something interesting. Your code seems to work, but one thing is that what should happen is that there should be blank lines that "pile up" at the top of the file. Meaning, if you add lines, but remove them at a higher rate, then all the lines will get deleted. When you add a line, it will get appended. So when the appended line gets deleted it should stay there, which means the blank lines should *accumulate at the top of the file*. With my nodejs code, this happens, but with your python code it doesn't. Can you check it out? – Alexander Mills Dec 26 '16 at 11:09
  • see: https://stackoverflow.com/questions/41329374/translate-this-python-code-to-node-js/41329828#41329828 – Alexander Mills Dec 26 '16 at 11:09
  • you can also see here: https://github.com/ORESoftware/cmd-queue/blob/master/lib/blank.js, thanks – Alexander Mills Dec 26 '16 at 11:19