2

I am trying to recursively search the current directory, performing a sed replace on the first line of each .txt file found.

Running either of these 2 commands, on MacOS:

find . -name "*.txt" -exec sed -i '' '1 s/([^()]*)//g' {} + 
find . -name '*.txt' -print0 | xargs -0 sed -i '' '1 s/([^()]*)//g'

leads to the same result. Only the "first" file found has the sed operation performed on it. This appears to be because of the 1 in sed -i '' '1 s/([^()]*)//g'. The weird thing is that even though this causes only the first file to be used, it also still only performs the sed replace on the first line of this file; which it should.

If I change the command to this

find . -name '*.txt' -print0 | xargs -0 sed -i '' '2 s/([^()]*)//g'

it is still only the first file that is changed, but now the second line has the replacement. My question, then, is why does this only appear to affect the first file returned by

find . -name '*.txt' -print0

Edit for Clarification

I should clarify what exactly I mean by only the "first" file has the sed operation performed on it by recreating the problem step by step. First,

This is the folder hierarchy (note the space in "folder 1"):

.
├── folder\ 1
│   └── test1.txt
├── folder2
│   └── test2.txt
├── folder3
│   └── test3.txt
└── folder4
    └── test4.txt

Each .txt file contains exactly this, and only this, one line:

This should stay (this should go)

When running either of the commands above, it is the file test2.txt that is changed, and the problem is that it is the only file that is changed!

So now, the files contain the following:

test1.txt: This should stay (this should go)

test2.txt: This should stay

test3.txt: This should stay (this should go)

test4.txt: This should stay (this should go)

I believe this is because the first part of the command, for example

find . -name '*.txt' -print0

gives the following (each separated by a \0 null character)

./folder2/test2.txt./folder3/test3.txt./folder4/test4.txt./folder 1/test1.txt

By changing the folder and file names around randomly, it is clear that it is always the first file in the above \0 delimited list that is changed.

So the question remains, what is it about the call to sed that prevents it being called on ALL of the files?

Thanks!

ev350
  • 429
  • 3
  • 16
  • What do you think the `''` does? Did you check in your seds help? What happens if you remove it? Which sed are you using, MacOs, GNU? – Yunnosch Jan 14 '20 at 20:45
  • 1
    @Yunnosch I thought the ‘’ was necessary to prevent MacOS from creating backup copies of the edited files? So by using ‘’ I forced sed to edit each file as is – ev350 Jan 14 '20 at 20:51
  • Double check in the help or docu. The `-i` syntax is at least different between MacOs and GNU. And there are some nasty traps, especially when combined with `-e` (which you did not use, I know). Double check the whole command line. But I am not trying to seem wise, your question puzzles me. I just mention the worst thinkgs I have learned concerning sed. – Yunnosch Jan 14 '20 at 20:54
  • Compare this answer https://stackoverflow.com/questions/43171648/sed-gives-sed-cant-read-no-such-file-or-directory/43453459#43453459 (yes one of mine...) describing the possibility to read `''` as "a file the name of which is the empty string". I do not see how this answers your question, but it probably is worth triple checking to not be involved. I do not have access to a MacOs sed, that is why I am wildly proposing things to try and read. – Yunnosch Jan 14 '20 at 20:56
  • 1
    If I got you right that you use MacOs sed, then I recommend to [edit] and explicity mention that in your question. Try to also find a tag to reflect that. – Yunnosch Jan 14 '20 at 20:59
  • 1
    What is the meaning of the `+` at the end of the first command? I think if you use `\;` instead, it will work as intended. I'm not sure why the `xargs` versions don't work. – Beta Jan 15 '20 at 01:50
  • @Beta I will have to look in to the why, but that actually worked! Please provide this an answer, and if you want a little explanation, and I'll green tick it. Thank you! – ev350 Jan 15 '20 at 02:03
  • 1
    Thank you, but I won't post it as an answer until I *can* explain why it works. I have a partial answer to the `xargs` riddle... – Beta Jan 15 '20 at 02:06
  • @Beta: For the `+` in place of (escaped) `;`, see POSIX [`find`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html). It means "one or more files" (similar to `+` in an ERE, extended regular expression) until the argument list is too long. It also means you don't need [`xargs`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html), and you don't get plagued by `xargs` default rule that 'if no file names are read, the command is still executed once'. GNU `xargs` provides `-r` to override that default; POSIX doesn't provide a mechanism to override the default. – Jonathan Leffler Jan 15 '20 at 08:17
  • See GNU [`xargs`](https://www.gnu.org/software/findutils/manual/html_node/find_html/xargs-options.html) for options, including `-r` or `--no-run-if-empty`. – Jonathan Leffler Jan 15 '20 at 08:18
  • The fundamental problem is that the line numbers in `sed` are cumulative across the entire set of files edited. There's only one line `$` — that's the last line of the last file. There's only one line `1` — that's the first line of the first (non-empty) file. That explains the results you're observing. The fix is to run a separate `sed` for each file. You can replace the `+` at the end of the `find` with `\;` (or `';'` or `";"`) so it executes the `sed` command for each file. If the choice is between 'slow but correct' and 'fast but wrong', choose 'slow but correct'. – Jonathan Leffler Jan 15 '20 at 08:25
  • NB: POSIX `xargs` provides `-n number` which executes the command with up to _number_ arguments (subject to length limits) but with the caveat _The last iteration has fewer than number, but not zero, operands remaining._ So, POSIX `xargs` won't execute the command with 0 arguments if you specify `-n 1000`. The macOS `xargs` behaves like that automatically (even without `-n`); the GNU `xargs` adheres to POSIX and executes the command once with no file names if `-n` (or `-r`) is not given. The POSIX rule is silly but backwards compatible with some ancient version of `xargs`. That's unfortunate! – Jonathan Leffler Jan 15 '20 at 08:31
  • @Yunnosch: The GNU `sed` takes an optional suffix to `-i` but that must be attached to the option, so `-i` on its own means "no backup suffix" (and hence no backup), while `-i.bak` means "use `.bak` as the backup suffix". The macOS `sed` requires a suffix (it is _not_ optional), but the empty suffix must be in a separate argument. Hence `-i .bak` means "use `.bak` as the backup suffix" and `-i ''` or `-i ""` means no backup. If a script must be portable between GNU and macOS, then you must use `-i.xyz` with the chosen suffix attached to the `-i` argument. There is no portable in-place option. – Jonathan Leffler Jan 15 '20 at 08:39

1 Answers1

2

I suppose the question about the 1st command is answered by Beta and let me answer the 2nd one.

Try to put -t (test) option to xargs and see how the command line is expanded:

find . -name '*.txt' -print0 | xargs -0 -t sed -i '' '1 s/([^()]*)//g'

It will output something like:

sed -i '' 1 s/([^()]*)//g ./test1.txt ./test2.txt ./test3.txt ./test4.txt

The default behavior of xargs is to execute the specified command (sed in this case) at once with the all arguments read from the standard input.
In addition sed doesn't reset line numbering across multiple input files and the s command above will be applied for the 1st file only.

You can change the behavior of xargs with -l1 option:

find . -name '*.txt' -print0 | xargs -0 -l1 -t sed '1 s/([^()]*)//g'

Output:

sed -i '' 1 s/([^()]*)//g ./test1.txt
sed -i '' 1 s/([^()]*)//g ./test2.txt
sed -i '' 1 s/([^()]*)//g ./test3.txt
sed -i '' 1 s/([^()]*)//g ./test4.txt

Then sed will work as expected.

tshiono
  • 21,248
  • 2
  • 14
  • 22