Having trouble using sed command in MAC

Question

I'm trying to do the following:

LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt

The code is working but, when I open the new file, the replace adds an additional character "A¦". Why is that?

Depends on how you typed the ¦ character and how you are viewing the file. I'm guessing your command line represented that as UTF-8 whereas you are apparently using something else (Latin-1?) to view the file (though strictly speaking that should give you `Â¦`, not `A¦`). Perhaps see also https://meta.stackoverflow.com/questions/379403/problematic-questions-about-decoding-errors — tripleee, Jan 13 '21 at 13:11
As regards your question before me editing it, [**do not use signature, taglines, or greetings**](https://stackoverflow.com/help/behavior). — Enlico, Jan 13 '21 at 13:11
This is almost certainly a duplicate, but I fail to find one which is very specific to `sed` on macOS. — tripleee, Jan 13 '21 at 13:31

score 0 · Answer 1 · answered Jan 13 '21 at 13:36

When you typed

LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt

your shell was probably configured to accept the command itself as UTF-8, and so in fact you ended up converting the single byte 0x7C (U+007C) to the two bytes 0xC2 0xA6 which is the correct UTF-8 encoding for U+00A6.

What you then did is unclear, but somehow you ended up examining the file in some other encoding than UTF-8, which exposes the two bytes as the string you report seeing.

The correct workaround is to examine the file in a correctly configured program which supports UTF-8.

Having trouble using sed command in MAC

1 Answers1