5

I'm trying to remove duplicate lines from a file and update the file. For some reason I have to write it to a new file and replace it. Is this the only way?

awk '!seen[$0]++' .gitignore > .gitignore

awk '!seen[$0]++' .gitignore > .gitignore_new && mv .gitignore_new .gitignore
ThomasReggi
  • 55,053
  • 85
  • 237
  • 424
  • It's the only **smart** way. Deleting in place is possible, but it requires the file to be opened without truncation. Then when it is written, it has to be truncated to the new size. It's a hassle even if we don't consider cases when the operation is interrupted, leaving a half-baked file. – Kaz Jun 11 '16 at 20:54

3 Answers3

13

Redirecting to the same output file as input file like:

awk '!seen[$0]++' .gitignore > .gitignore

will end with an empty file. This is because using the > operator, the shell will open and truncate the file before the command get's executed. Meaning you'll lose all your data.

With newer versions of GNU awk you can use the -i inplace option to edit the file in place:

awk -i inplace '!seen[$0]++' .gitignore

If you don't have a recent version of GNU awk, you'll need to create a temporary file:

awk '!seen[$0]++' .gitignore > .gitignore.tmp
mv .gitignore.tmp .gitignore

Another alternative is to use the sponge program from moreutils:

awk '!seen[$0]++' .gitignore | sponge .gitignore

sponge will soak all stdinput and open the output file after that. This effectively keeps the input file intact before writing to it.

spazm
  • 4,399
  • 31
  • 30
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 1
    Does not work. `gawk: fatal: can't open source file \`!seen[$0]++' for reading (No such file or directory)`. – Kaz Jun 11 '16 at 21:00
  • That's with `gawk` on the `master` branch, as of the Jun 6, 2016 commit `4f758771937fcbd59b1fd2db017c4995513c3988` by Robbins. – Kaz Jun 11 '16 at 21:01
  • @Kaz As I said, `-i` is a relatively new gawk feature. Looks like your `gawk` doesn't support it. – hek2mgl Jun 11 '16 at 21:01
  • 1
    `-i` is for including files; it requires an argument. – Kaz Jun 11 '16 at 21:03
  • Yes it requires an argument, you are right. Changed that. See: http://stackoverflow.com/a/16531920/171318 .. Actually in place editing in awk is realized using an include. – hek2mgl Jun 11 '16 at 21:04
1

Thomas, I believe the problem is that you are reading from it and writing to it on the same command. This is why you must put to a temporary file first.

The > does overwrite, so you are using the correct redirect operator

  • Redirect output from a command to a file on disk. Note: if the file already exist, it will be erased and overwritten without warning, so be careful.

Example: ps -ax >processes.txt Use the ps command to get a list of processes running on the system, and store the output in a file named processes.txt

Chewy
  • 651
  • 6
  • 21
-2

Yes, because if you don't do that shell will create file descriptor and truncate .gitignore even before awk process started.

amaksr
  • 7,555
  • 2
  • 16
  • 17