7

Someone on our server ran sed -i 's/$var >> $var2/$var > $var2/ * to change inserts to overwrites in some bash scripts in a common directory. No big deal, it was tested first with grep and it returned the expected results that only his files would be touched.

He ran the script and now 1200 files of the 1400 in the folder have a new modified date, yet as far as we can tell, only his small handful of files were actually changed.

  1. Why would sed 'touch' a file that it's not changing.
  2. Why would it only 'touch' a portion of the files and not all of them.
  3. Did it actually change something (maybe some trailing white space or something totally unexpected because of the $'s in the sed regex)?
JNevill
  • 46,980
  • 4
  • 38
  • 63
  • 1
    Is it gnu sed? Others might vary. – Flexo Nov 21 '14 at 22:00
  • 2
    As a comment on pt 3 (this is not an answer): if you use version control (git/subversion/etc) on your scripts, then questions like this of what changed or didn't change can always be clearly answered. – John1024 Nov 21 '14 at 22:03
  • I love version control, but I have no control over this server. Everything on it is out of date and it would take an act of congress to get git installed. But yes, I completely 100% agree with you. Until then... nightly backups and prayers. – JNevill Nov 21 '14 at 22:09
  • @Flexo GNU sed version 4.1.4 – JNevill Nov 21 '14 at 22:09
  • Is there anything to prevent you from extracting the file tree from backups (or just `rsync` periodically if policy permits) and run Git in the mirrored tree? – tripleee Nov 22 '14 at 11:21

2 Answers2

14

When GNU sed successfully edits a file "in-place," its timestamp is updated. To understand why, let's review how edit "in-place" is done:

  1. A temporary file is created to hold the output.

  2. sed processes the input file, sending output to the temporary file.

  3. If a backup file extension was specified, the input file is renamed to the backup file.

  4. Whether a backup is created or not, the temporary output is moved (rename) to the input file.

GNU sed does not track whether any changes were made to the file. Whatever is in the temporary output file is moved to the input file via rename.

There is a nice benefit to this procedure: POSIX requires that rename be atomic. Consequently, the input file is never in a mangled state: it is either the original file or the modified file and never part way in-between.

As a result of this procedure, any file that sed successfully processes will have its timestamp changed.

Example

Let's consider this inputfile:

$ cat inputfile
this is
a test.

Now, under the supervision of strace, let's run sed -i on it in a way guaranteed to cause no changes:

$ strace sed -i 's/XXX/YYY/' inputfile

The edited result looks like:

execve("/bin/sed", ["sed", "-i", "s/XXX/YYY/", "inputfile"], [/* 55 vars */]) = 0
[...snip...]
open("inputfile", O_RDONLY)             = 4
[...snip...]
open("./sediWWqLI", O_RDWR|O_CREAT|O_EXCL, 0600) = 6
[...snip...]
read(4, "this is\na test.\n", 4096)     = 16
write(6, "this is\n", 8)                = 8
write(6, "a test.\n", 8)                = 8
read(4, "", 4096)                       = 0
[...snip...]
close(4)                                = 0
[...snip...]
close(6)                                = 0
[...snip...]
rename("./sediWWqLI", "inputfile")      = 0

As you can see, sed opens the input file, inputfile, on file handle 4. It then creates a temporary file, ./sediWWqLI on file handle 6, to hold the output. It reads from the input file and writes it unchanged to the output file. When this is done, a call to rename is made to overwrite inputfile, changing its timestamp.

GNU sed source code

The relevant source code is in the execute.c file of the sed directory of the source. From version 4.2.1:

  ck_fclose (input->fp);
  ck_fclose (output_file.fp);
  if (strcmp(in_place_extension, "*") != 0)
    {
      char *backup_file_name = get_backup_file_name(target_name);
      ck_rename (target_name, backup_file_name, input->out_file_name);
      free (backup_file_name);
    }

  ck_rename (input->out_file_name, target_name, input->out_file_name);
  free (input->out_file_name);

ck_rename is a cover function for the stdio function rename. The source for ck_rename is in sed/utils.c.

As you can see, no flag is kept to determine whether the file actually changed or not. rename is called regardless.

Files whose timestamps were not updated

As for the 200 of the 1400 files whose timestamps did not change, that would mean that sed somehow failed on those files. One possibility would be a permissions issue.

sed -i and Symbolic Links

As noted by mklement0, applying sed -i to a symbolic link leads to a surprising result. sed -i does not update the file pointed to by the symbolic link. Instead, sed -i overwrites the symbolic link with a new regular file.

This is a result of the call that sed makes to the STDIO rename. As documented by man 2 rename:

if newpath refers to a symbolic link the link will be overwritten.

mklement0 reports that this is also true of the (BSD) sed on Mac OSX 10.10.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • I had assumed this was the case for edit in place. Thanks for dropping the source code in the answer! The failure on the 200 is still a bit of a mystery since the permissions seem to be in line with the other files, but perhaps it was something else that caused them to fail. Either way, I think you are right on. Thanks! – JNevill Nov 22 '14 at 14:01
  • 1
    Great analysis; without checking the source code, I've found that GNU `sed` (as of at least 4.2.2) correctly preserves the _permissions_ of the original input file. However, sadly, if the original file was a _symlink_, it is replaced with a _regular_ file. What is also _not_ preserved is the _creation_ timestamp of the original file. BSD `sed` (as of at least the version that comes with OSX 10.10) seems to behave the same in all respects. – mklement0 May 03 '15 at 02:08
  • 1
    @mklement0 Wow! That is an important observation. While I can understand the author's doing it that way, it certainly is not what I would expect for _"in-place"_ editing. (I have updated the answer with that information.) – John1024 May 03 '15 at 20:54
  • 1
    @John1024: Thanks, and fully agreed. The woefully disregarded `ed` is your friend when it comes to _true_ in-place editing, which preserves the existing `inode` - it comes with the caveat that it invariably reads the original file into memory _as a whole_. – mklement0 May 03 '15 at 21:07
  • "Whether a backup is created or not, the temporary output is moved (rename) to the input file." Couldn't a diff be applied in-between to prevent this if the file did not change ? – Jean-Michaël Celerier Nov 01 '15 at 12:28
10

I use the following workaround, i.e. look at each file separatedely, use grep to check if the file contains the string and then use sed. Not very nice, but works...

for i in *;do grep mytext $i && sed -i -e 's/mytext/replacement/g' $i;done
centic
  • 15,565
  • 9
  • 68
  • 125