Process on the same file without temp (intermediate) file

Question

I am doing some text process on a file using awk. For instance removing the trailing whitespace.

awk '{gsub(/ +$/, "")} {print $0}' filename

This works fine. But when I redirect the output to the original file. It becomes an empty file.

temp$ awk '{gsub(/ +$/, "")} {print $0}' abc > abc
temp$ cat abc
temp$

So I tried another way. Use cat and pipe rather than as a input parameter of awk.

temp$ cat abc | awk '{gsub(/ +$/, "")} {print $0}' abc > abc
temp$ cat abc
temp$

Still doesn't work. Is there a way to achieve the same goal without involving an intermediate file?

http://stackoverflow.com/questions/16529716/awk-save-modifications-inplace — Marc B, Dec 17 '14 at 18:16
I'd like to get at the root of the question first: What about having an intermediate file is a problem for you? Is it just the hassle of creating/copying/deleting it? Or something else? — thkala, Dec 17 '14 at 18:42

score 2 · Answer 1 · answered Dec 17 '14 at 18:16

2

you can use sed -i and sed will handle it for you

example:

sed -i 's/[ \t]*$//g' file

answered Dec 17 '14 at 18:16

midori

4,807
5
34
62

1

Thanks. And I saw this kind of answer by using sed before. But the point is, I am not just doing this kind of operation. What if I do some more complicated process which sed cannot do and I still don't want to use an intermediate file? – tricycle Dec 17 '14 at 18:22
you can write small function wich will do it for you and use it whenever it's needed. Otherwise it's impossible without intermediate file. For more comlicated stuff you can use perl -i as well. – midori Dec 17 '14 at 18:25
Note that `sed -i ...` is still using a temporary file - it just hides the details from you. It doesn't really edit in-place - that's difficult, if not impossible, to do in the general case. – twalberg Dec 17 '14 at 20:23

anubhava · Answer 2 · 2014-12-21T10:21:23.273

1

Problem with use of > abc is that shell processes redirection first and initializes the file abc to 0 byte before it runs your actual command. So in other words your awk command is run on an empty 0 byte file.

Here is a trick you can use not just for this command but for any other command as well.

f='abc'
awk '{sub(/ +$/, "")} 1' "$f" | awk -c f="$f" -v RS=$'\g' 'END{printf $0 > f}'

$'\g' is just a randomly selected improbable record separator that will never exist in any file cause whole file to be read in one line. Trick is to read whole file in one record and write in the output only in END section. This will work with big size files also.

Earlier Solution: You can make use of tee:

awk '{gsub(/ +$/, "")} {print $0}' abc | tee abc

If you want to discard output on stdout use:

awk '{gsub(/ +$/, "")} {print $0}' abc | tee abc > /dev/null

edited Dec 21 '14 at 10:21

answered Dec 17 '14 at 18:25

anubhava

761,203
64
569
643

I doubt this would work for very large files. You can use sponge perhaps. – Lynch Dec 17 '14 at 18:28
the reason it seems to work is because all your test file can be buffered. If you have a large file it cannot be buffered before writting by tee. Test using these commands: https://gist.github.com/anonymous/2b7241d6940d21690c3e – Lynch Dec 17 '14 at 21:12
sorry, slight mistake in the verification. check this out: https://gist.github.com/anonymous/2c37cfdd42507caa4b44 – Lynch Dec 17 '14 at 21:41
Sorry -1. your solution is broken and should not be used in production. you might corrupt your data. – Lynch Dec 19 '14 at 20:13
If I understand correctly the whole file is loaded into memory? This might still be a problem with large files. Writing the data as it is processed seems like the best option to me. Trying to avoid the temporary file with a more complex and less efficient solution sounds like a bad idea. "you can't make an omelette without breaking eggs". – Lynch Dec 21 '14 at 16:58
In modern systems it shouldn't be a problem at all to process big files. How big can the input file be that is for OP to check. In any case this solution is for the question that says **Process on the same file without temp (intermediate) file**. – anubhava Dec 21 '14 at 17:04

score 1 · Answer 3 · edited May 23 '17 at 10:28

There is several possible solutions. But please make sur you test with a large file, on my machine a file of less than ~100Ko will work with this: cat abc | tee abc > /dev/null but the problem occur when the pipe buffer is full and is then sent to the next process. When tee receive the first chunk of information it writes to the file then the cat process cannot read anymore from that file, this result in corrupting your data.

Use gawk 4.1+ you have the option inplace (-i) like sed does. see this post: awk save modifications in place

If you cant use gawk 4.1 you can still convert to a sed inplace expression like others are suggesting.

Otherwise to keep it as a one liner you can use sponge (part of moreutils) to redirect to de same file:

$ yes testing | head -n10000000 > /tmp/test
$ du /tmp/test
77M     /tmp/test
$ cat /tmp/test | sponge /tmp/test
$ du /tmp/test
77M     /tmp/test

If installing moreutils to use sponge is not possible for you I suggest a simple temp file then moving the file:

$ tmp=$(mktemp)
$ echo $tmp
/tmp/tmp.Tl0v8HmdaA
$  awk '{gsub(/ +$/, "")} {print $0}' abc > $tmp
$ mv $tmp abc

Baba · Answer 4 · 2014-12-20T18:55:07.950

1

use sponge from moreutils tools

Probably the most general purpose tool in moreutils so far is sponge(1), 
which lets you do things like this:
% sed "s/root/toor/" /etc/passwd | grep -v joey | sponge /etc/passwd

e.g:

/tmp$ cat -E abc 
aaaaa    $
/tmp$ awk '{gsub(/ +$/, "")} {print $0}' abc | sponge abc 
/tmp$ cat -E abc 
aaaaa$

edited Dec 20 '14 at 18:55

answered Dec 19 '14 at 21:08

Baba

852
1
17
31

Process on the same file without temp (intermediate) file

4 Answers4