1

Trying to replace file contents using sed, the replacement works, but for some reason I am getting extra white space at the end of the resulting output file, causing the file to be unreadable/unviewable in the opening application.

My command is as follows:

for file in *.example ; do LANG=C sed -i "" "s|https://foo.bar|http://foo.bar|g" "$file" ; done

Things I have tried without success:

  • Not wrapping the s/[...]/g argument in quotes (causes command to fail)
  • Using delimiters other than | such as / or _ or % (makes no difference)
  • Using single quotes instead of double (makes no difference)
  • Escaping the periods and colons as well (makes no difference)

EDIT: This issue appears to be file-type related, and therefore I am no longer interested in a solution. Thank you to those who've replied.

agrafuese
  • 13
  • 4
  • replace sed -i "" with sed -i if you need a backup use sed -i.tmp. – V. Michel Jan 17 '17 at 20:28
  • @V.Michel: The macOS (OS X) implementation of `sed` (BSD Sed) requires `-i ""`, unlike the GNU implementation (`-i.tmp` works with both, because the option-argument is nonempty). For more information, see [this answer](http://stackoverflow.com/a/40777793/45375) of mine. – mklement0 Jan 17 '17 at 20:29
  • 1
    If white space at the end makes a file unopenable, it's probably not a plain text file; and if it's not a plain text file, doing a plain-text-style substitution on it may be corrupting whatever format it's in. What is the file format? – Gordon Davisson Jan 17 '17 at 22:46
  • The only way I see this happening is when you use variable in the `s` command and the variable contains `\n` or `\r`. – alvits Jan 18 '17 at 02:22
  • @GordonDavisson You are correct, it is not a plain text file; it is a torrent file. What's odd is, I've used other sed command statements on torrent files before with great success, but those were from different sources. I believe, for whatever reason, the source of these files has them encoded in a way that doesn't play nice with my sed command statement. In fact, I even ran an old sed command statement that I know to be working, but on these files, and it also corrupted them. – agrafuese Jan 18 '17 at 08:45
  • @GordonDavisson (continued from comment above) - For now, I've found an alternate method of doing what I need to do to these files, so I'm setting aside the sed method this time. Thanks for the comments and advice. – agrafuese Jan 18 '17 at 08:48

2 Answers2

1

I suggest to replace

\foo.bar

by

foo.bar
Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • 2
    Also, the typo would have been easier to spot (or rather, avoid altogether) if you had used a different delimiter: `sed 's|https://foo.bar|http://foo.bar|'`. – chepner Jan 17 '17 at 20:05
  • Thank you for spotting my typo. I have edited my post to correct it, and have taken your advice regarding a different delimiter. The original problem does still stand, however. Any advice on the whitespace problem? – agrafuese Jan 17 '17 at 20:10
  • 1
    @Cyrus Even though I am no longer interested in a solution to my specific issue, I am marking it as answered to give you thanks for responding and catching my typo :) Cheers. – agrafuese Jan 18 '17 at 16:21
0

With the benefit of hindsight:

BSD/macOS sed is fundamentally unsuitable for making substitutions in binary files, because it invariably outputs a trailing \n (newline) with every output command.

By contrast, GNU sed doesn't have this problem, because it - commendably - only appends a \n if the input "line" had one too.

Note that the concept of newline-separated lines doesn't really apply to binary input: newlines may or may not be present, and potentially with large chunks of data in between. In the worst case scenario, the entire input will be read at once.[1]

You can test this behavior with the following command:

sed -n 'p' <(printf 'x') | cat -et  # input printf 'x' has no trailing \n

Output x$ indicates that a newline (symbolized as $ by cat -et) was appended (BSD Sed), whereas just x indicates that it was not (GNU Sed).

Thus, given that you're on macOS, you could use Homebrew to install GNU Sed with brew install gnu-sed and then use the following command:

LANG=C gsed -i 's|https://foo.bar|http://foo.bar|g' *.example
  • Homebrew installs GNU Sed as gsed, so that it can exist alongside macOS's stock (BSD) sed.

  • LANG=C (slightly more robustly: LC_ALL=C) is needed to pass all bytes of the binary input through as-is, without causing problems stemming from binary bytes not being recognized as valid characters).
    Note that this approach limits you to ASCII-only characters in the substitution (unless you explicitly add byte values as escape sequences).

  • Note the different, incompatible -i syntax for in-place updating without backup - no (separate) option-argument here; see this answer of mine for background.

  • Note how '...' (single-quoting) is used around the Sed script, which is generally preferable, as it avoids confusion between shell expansions that happen up front and what Sed ends up seeing.


[1] Aside from memory use, it is fine to use Sed's default line-parsing behavior here, given that your substitution command doesn't match newlines. If you want to break the input into "lines" by NULs (and also use NULs on output), however, you can use GNU Sed's -z option.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775