delete first N bytes/lines from begining of file in bash inline thats actively written to by a process

Question

I've a need to remove/truncate first N bytes from a log file while data is being recorded continuously. e.g. nohup.out

While I can use bash truncate command like this.

truncate -c -s -10K my_file.

This will truncate latest data from end of the file. So not useful in this case.

I need file to be truncated from beginning of file (that has older data) and preserve new one.

I checked online, Most of the example are using redirection or writing to temp file using dd , head etc. My need is to do this inline on the same file.

Closest match is sed , but so far I found examples that truncates N characters from EVERY LINE. e.g. below will delete 10 bytes from each line in my_file.

sed -i 's/^$.$\{10\}//g' my_file

I am looking for options where I can delete first N bytes starting with 1st line and ending on Kth line where Nth byte for deletion ends, thus preserving latest data at the bottom.

I can probably cook-up some logic to achieve this, but was wondering if there is "off the shelf" option available.

Any pointers? Thanks.

Sed makes a copy and renames it after the edit, anyway. Files can't be truncated at the beginning, that's how files work. — choroba, Aug 29 '18 at 20:11
@choroba thanks for the comment. However, can we tweak `sed` to achieve the desired result? I am not much conversant with `sed`. — Anil_M, Aug 29 '18 at 20:13
If the file is open for writing by some other process, you need to close it to trim this way. Think of what the other process will see; it knows the length of the file, so to append to it, it seeks to the next position and writes. How could it know that the file is shrinking under it? You can probably use `ed` or `sed` or `awk` or perhaps even `dd` to trim from the beginning of the file, but you'll definitely need the other process out of the way while you do your work, if you don't want the other process to corrupt the file after you change it. — ghoti, Aug 29 '18 at 20:17
Since your reader and writer never overlap in execution, you can _probably_ accomplish it with sed and careful job control. However, what it sounds like to me you want is a "ring buffer in a file". For that, I'd use [`emlog`](https://github.com/nicupavel/emlog): _a Linux kernel module that makes it easy to access the most recent (and only the most recent) output from a process. It works just like "tail -f" on a log file, except that the storage required never grows._ — bishop, Aug 29 '18 at 20:24
[Delete the first n bytes of files](https://unix.stackexchange.com/q/13907/56041), [How to move first N bytes from text file to another text file](https://stackoverflow.com/q/25177284/608639), [What is a unix command for deleting the first N characters of a line?](https://stackoverflow.com/q/971879/608639), [Delete the first five characters on any line of a text file in Linux with sed](https://stackoverflow.com/q/3795512/608639), etc. — jww, Aug 30 '18 at 02:16
I am not moving N bytes. Problem is this needs to be inline while the file is being written. I've already referred to above link and it does not solve my problem. — Anil_M, Aug 30 '18 at 02:23
The above comments spell out in excruciating detail how your requirement to do it in-line is just cosmetics. Doing it as two commands (delete, move results over the old file) does the same thing as any ostensibly in-line operation, just not as a single command. I will proceed to mark this as a duplicate of one of the proposed answers. — tripleee, Aug 30 '18 at 04:40
@tripleee :None of these answers in the link of give below solves my problem of deleting N bytes/ lines from top of a file actively written by a process. I am still working on resolving it at my end. They provide answer to a static file which is not case here. — Anil_M, Aug 30 '18 at 14:36
There is no sane reliable way to rewrite a file which is still open for writing by another process. — tripleee, Aug 30 '18 at 14:42

ghoti · Accepted Answer · 2018-08-30T15:35:49.423

The following will print lines until the line that contains the Nth byte:

awk -v n="$n" 'c>=n{exit} {c+=length()+1} 1'

where the shell variable $n contains the number of bytes that are important to you. The +1 is there so that newlines will be included. If you don't have single-character newlines, adjust to suit, or perhaps use length(ORS) instead.

Note that this DOES NOT handle the impossible part of your request, to change the file while another process has it open for writing.

To achieve the inverse of this -- that is, to print every line starting after the Nth byte, we need something slightly different:

awk -v n="$n" 'c>=n{p=1} {c+=length()+1} p'

This sets a semaphore, p, once sufficient characters have been seen, then prints if the semaphore evaluates as true.

The much-lower-performance equivalent bash-only version of this might look like:

c=0; p=0
while read; do
  ((c>=n)) && p=1
  ((c+=${#REPLY}+1))
  ((p)) && echo "$REPLY"
done

You could use this as a pipe, or use input redirection to read a file. It also assumes that $n contains an integer.

Nice....I really need to get comfortable with `awk` again. Perl is great, but awk tends to be more succinct where regexes are not involved. It also loads much faster, which tends to matter when processing thousands of files one at a time. — zzxyz, Aug 29 '18 at 20:36

zzxyz · Answer 2 · 2018-08-29T21:08:19.483

1

perl -i -pe 'BEGIN{$x=100} {if ($x > 0) {$x -= length$_; s/^.*\r?\n?//;}}' file

Where x is the number of characters you want to trim from the beginning of the file. If this is not the same, I believe a library might be necessary.

It works by simply counting down as it search and replaces entire lines with nothing. It then stops further processing. This rewrites the entire file, and there may be utilities that do this in a more clever fashion.

To make this configurable, use -s followed by -- and -x=100 (which sets $x via bash):

perl -i -spe 'if ($x > 0) {$x -= length$_; s/^.*\r?\n?//;}' -- -x=100 file

edited Aug 29 '18 at 21:08

answered Aug 29 '18 at 20:26

zzxyz

2,953
1
16
31

@zzxys , Thanks for the answer but I can't use `PERL` or `Python`. Need to strictly use `bash` as indicated in the tags – Anil_M Aug 29 '18 at 20:29
@Anil_M - Are you aware of any bash shells that don't have perl? I've never run across any (including rescue disks, minimal MINGW installs on Windows, etc.). Obviously 100% possible in theory, but I'm curious if you have a practical example. – zzxyz Aug 29 '18 at 20:33
2

@Anil_M Neither sed nor `truncate` are part of Bash. – Benjamin W. Aug 29 '18 at 20:36
1

@zzxyz you are correct, I am new to `linux` world. I am using `Debian` and it has `perl`, Yes the solution works as intended. Let me run few tests. thanks – Anil_M Aug 29 '18 at 20:38
@zzxyz , how would I pass an external bash variable that holds 'number of characters value' to PERL ? – Anil_M Aug 29 '18 at 20:48
2

Ah got it. `export n=100 ; perl -i -pe 'BEGIN{$x=$ENV{n}} {if ($x > 0) {$x -= length$_; s/^.*\r?\n?//;}}' my_file` . Yours is an acceptable answer. – Anil_M Aug 29 '18 at 20:55
@Anil_M - I've updated my answer, although I may like your version better since it can be used more generally (in other applications that can read environment variables, which is most of them) – zzxyz Aug 29 '18 at 20:57
I've to take back answer as this solution is not working for me. While the nohup.out get truncated once. After that it no longer writes to the file, possibly due to the fact that iNode has changed. The script continues in the background but I can't view its progress anymore. Researching `log rotate` for this problem. – Anil_M Aug 30 '18 at 03:15

delete first N bytes/lines from begining of file in bash inline thats actively written to by a process

2 Answers2