I'm trying to do the following:
LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt
The code is working but, when I open the new file, the replace adds an additional character "A¦". Why is that?
I'm trying to do the following:
LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt
The code is working but, when I open the new file, the replace adds an additional character "A¦". Why is that?
When you typed
LC_CTYPE=C sed 's/|/¦/g' t.txt > new_t.txt
your shell was probably configured to accept the command itself as UTF-8, and so in fact you ended up converting the single byte 0x7C (U+007C) to the two bytes 0xC2 0xA6 which is the correct UTF-8 encoding for U+00A6.
What you then did is unclear, but somehow you ended up examining the file in some other encoding than UTF-8, which exposes the two bytes as the string you report seeing.
The correct workaround is to examine the file in a correctly configured program which supports UTF-8.