139

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name

however, when I do this I end up with a blank file. Any thoughts?

Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
mike
  • 3,339
  • 6
  • 30
  • 34
  • 1
    See this as well: [How to make reading and writing the same file in the same pipeline always “fail”?](https://unix.stackexchange.com/a/409896/201820) on Unix & Linux SO. – codeforester Apr 04 '19 at 19:16
  • 1
    Several answers here are duplicates, and several deleted answers propose adding a pipe, like `grep 'moo' file | cat >file` which of course doesn't help at all. Please review existing answers before adding a new one, and please test any new solution before proposing it. – tripleee Apr 03 '22 at 09:41

14 Answers14

134

Use sponge for this kind of tasks. Its part of moreutils.

Try this command:

 grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
Lynch
  • 9,174
  • 2
  • 23
  • 34
  • 8
    Thanks for the answer. As a possibly helpful addition, if you're using homebrew on Mac, can use `brew install moreutils`. – Anthony Panozzo Feb 06 '13 at 02:12
  • 6
    Or `sudo apt-get install moreutils` on Debian-based systems. – Jonah Aug 15 '14 at 16:45
  • 4
    Damn! Thanks for introducing me to moreutils =) some nice programs there! – netdigger May 25 '15 at 11:00
  • thank you so much, moreutils for the rescue! sponge like a boss! – aqquadro Oct 20 '16 at 09:30
  • 5
    Word of caution, "sponge" is destructive, so if you have an error in your command, you can wipe out your input file (as I did the first time trying sponge). Make sure your command works, and/or the input file is under version control if you are trying to iterate on making the command work. – user107172 Dec 27 '16 at 18:13
  • `sudo yum install moreutils` on rhel/centos/7, works on Fedora too – Ray Foss Jun 01 '17 at 22:39
  • I am so happy to have learned about `sponge` and will start using it. Unfortunately I need to collaborate with Windows users, who only have Git Bash, so they don't have `moreutils`. That means writing to a temp file for them, and then `mv`. Note to self: take a look at all the tools in `moreutils`. – Amedee Van Gasse Oct 11 '17 at 11:42
  • 1
    There's also a JavaScript implementation of `sponge`, [here](https://github.com/eush77/node-sponge). Handy for `package.json` scripts and such. – Alec Mev Apr 08 '18 at 13:25
112

You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.

#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}

like that, consider using mktemp to create the tmpfile but note that it's not POSIX.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
c00kiemon5ter
  • 16,994
  • 7
  • 46
  • 48
  • 59
    The reason why you can't do that: bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. – glenn jackman Jul 14 '11 at 17:27
  • 1
    @glennjackman: by "processes redirection you mean that in the case of > it opens the file and clears it and in the case of >> it only opens it" ? – Razvan Sep 11 '15 at 14:58
  • 2
    yes, but of note in this situation, the `>` redirection will open the file and truncate it **before** the shell launches `grep`. – glenn jackman Sep 11 '15 at 15:48
  • Instead of this, the [answer using the `sponge` command](https://stackoverflow.com/a/6697219/1879728) should be accepted. – vlz Jan 31 '20 at 16:37
  • It's perfectly possible do it with redirections, you just have to [remove the file](https://stackoverflow.com/a/61857318/1011859) before writing to it. – pistache May 17 '20 at 19:10
23

Use sed instead:

sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
Manny D
  • 20,310
  • 2
  • 29
  • 31
18

try this simple one

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name

Your file will not be blank this time :) and your output is also printed to your terminal.

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
sailesh ramanam
  • 303
  • 2
  • 3
8

You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).

Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):

ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name

where:

  • '+cmd'/-c - run any Ex/Vim command
  • g/pattern/d - remove lines matching a pattern using global (help :g)
  • -s - silent mode (man ex)
  • -c wq - execute :write and :quit commands

You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?

Community
  • 1
  • 1
kenorb
  • 155,785
  • 88
  • 678
  • 743
7

Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):

echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name

The general case is:

echo "$(cat file_name)" > file_name

Edit, the above solution has some caveats:

  • printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
  • Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
  • Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
  • Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.

The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:

(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)

Test from https://askubuntu.com/a/752451:

printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt

Should print:

hello
world

Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:

printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt

Prints an empty string.

I haven't tested this on large files (probably over 2 or 4 GB).

I have borrowed this answer from Hart Simha and kos.

Zack Morris
  • 4,727
  • 2
  • 55
  • 83
  • 2
    Of course it will not work with large file. This can't possibly be a good solution or work all the time. What is happening is that bash execute first the command and then load the stdout of `cat` and put it as first argument to `echo`. Of course non printable variables will not output properly and corrupt the data. Don't try to redirect a file back to itself, it just can't be good. – Lynch Sep 19 '18 at 04:12
  • Here is a newer/better command that takes the place of `sponge` and is cross-platform if your shell has `perl` installed: https://stackoverflow.com/a/69212059/539149 `cat file_name.txt | grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' | perl -spe'open(STDOUT, ">", $o)' -- -o=file_name.txt` – Zack Morris Oct 07 '21 at 16:52
7

This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:

exec 3<file ; rm file; COMMAND <&3 >file ;  exec 3>&-

Or line by line, to understand it better :

exec 3<file       # open a file descriptor reading 'file'
rm file           # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&-         # close the file descriptor

It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :

exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-

We can also define a shell function to make it easier to use :

# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${@:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }

Example :

$ echo aaa > test
$ replace test tr a b
$ cat test
bbb

Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.

pistache
  • 5,782
  • 1
  • 29
  • 50
6

One liner alternative - set the content of the file as variable:

VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
w00t
  • 616
  • 3
  • 9
  • 16
  • Several other similar answers have appeared, some of which have a fuller discussion of how this works, and some caveats. Modern scripts should definitely prefer the modern `$(command substitution)` syntax over backticks, which were deeply obsolescent already in 2013. – tripleee Apr 03 '22 at 09:30
  • Just to repeat the feedback from elsewhere, you should prefer `printf` over `echo` here for robustness; and this will lose any trailing newlines. – tripleee Apr 03 '22 at 09:32
3

The following will accomplish the same thing that sponge does, without requiring moreutils:

    shuf --output=file --random-source=/dev/zero 

The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.

However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:

# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
#    $1: the file.
#    $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113

siphon()
{
    local tmp file rc=0
    [ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
    file="$1"; shift
    tmp=$(mktemp -- "$file.XXXXXX") || return
    "$@" <"$file" >"$tmp" || rc=$?
    mv -- "$tmp" "$file" || rc=$(( rc | $? ))
    return "$rc"
}
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • `$*` really needs to be `"$@"`. Otherwise, `siphon "two words"` becomes identical `siphon "two" "words"`. Other than that, this answer is great. – Charles Duffy Jan 20 '21 at 19:42
  • ...also, think about telling `mktemp` to create the temporary file in the same directory as where the output file lives; if the two locations are on different filesystems, the `mv` won't be atomic. `local tmp=$(mktemp "$1.XXXXXX")` is one quick/easy way to do that. – Charles Duffy Jan 20 '21 at 19:48
  • (Also, think about making `local tmp file` its own line; that way `tmp=$(mktemp)` will pass through the exit status of `mktemp`, so you can detect a case where it fails and act appropriately; for example, `tmp=$(mktemp) || return` to abort the rest of the function if `mktemp` doesn't succeed; that won't work with `local` preceding on the same line, since `local` itself has its own exit status and overrides `$?`). – Charles Duffy Jan 20 '21 at 19:49
  • I'd also suggest `mv -- "$tmp" "$file"` so filenames that start with dashes aren't incorrectly parsed as options to `mv`. See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02, guideline 10. – Charles Duffy Jan 20 '21 at 19:51
  • And think about `siphon() {` with no preceding `function`; the `function siphon {` is a POSIX-incompliant kshism (which in bash doesn't behave quite the way it does in ksh, where it modifies how variable declarations behave within the function body). It's better than `function siphon() {`, which isn't compatible with _either_ POSIX sh or legacy ksh, but worse than `siphon() {` with no `function` at all. – Charles Duffy Jan 20 '21 at 19:54
  • @CharlesDuffy thank you very much for your suggestions. Unfortunately, it has been a while since I wrote this, and honestly, I can hardly remember writing it. You know that stuff better than I do. Feel free to edit my answer to apply your suggestions. (If not, I will do it when I find the time, I have no idea when that will be though, because I also have to try it to make sure it works.) – Mike Nakis Jan 20 '21 at 20:32
  • Updated appropriately, thank you for the clearance to do so. (Also tried to update error handling, so we pass through a nonzero exit status from the underlying command). – Charles Duffy Jan 20 '21 at 20:55
  • @CharlesDuffy Thanks again, Charles! – Mike Nakis Jan 20 '21 at 21:22
2

This does the trick pretty nicely in most of the cases I faced:

cat <<< "$(do_stuff_with f)" > f

Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying. (Look for “Here Strings” in man bash if you want to learn more.)

Full example:

#! /usr/bin/env bash

get_new_content() {
    sed 's/Initial/Final/g' "${1:?}"
}

echo 'Initial content.' > f
cat f

cat <<< "$(get_new_content f)" > f

cat f

This does not truncate the file and yields:

Initial content.
Final content.

Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.

A common usecase is JSON edition:

echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f

This yields:

{ "a": 12 }
{
  "a": 24
}
Alice M.
  • 351
  • 2
  • 9
1

You can use slurp with POSIX Awk:

!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
  q = q ? q RS $0 : $0
}
END {
  print q > ARGV[1]
}

Example

Zombo
  • 1
  • 62
  • 391
  • 407
  • 1
    It should perhaps be pointed out that "slurp" means "read the entire file into memory". If you have a large input file, maybe you want to avoid that. – tripleee Nov 09 '17 at 10:02
1

There's also ed (as an alternative to sed -i):

# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq |  ed -s file_name
nerx
  • 19
  • 1
-1

Try this

echo -e "AAA\nBBB\nCCC" > testfile

cat testfile
AAA
BBB
CCC

echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
-2

I usually use the tee program to do this:

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name

It creates and removes a tempfile by itself.