1

I used this command to remove left-to-right mark from a string.

# CHARS=$(python -c 'print ("\u200E".encode("utf8"))')

# echo "test be" | sed 's/['"$CHARS"']//g'
test e

As seen in the above example sed has removed "b".

why has it removed character "b" and how to remove left-to-right mark?

shantanuo
  • 31,689
  • 78
  • 245
  • 403

2 Answers2

2

If you are using Python here anyway, why are you not implementing the entire operation in Python?

#!/usr/bin/python3

import fileinput

for line in fileinput.input():
     print(line.replace("\u200e", ""), end="")

Demo: https://ideone.com/5dV285

If you insist on a one-liner, try with Perl instead of sed:

perl -CSD -pe 's/\x{200e}//g'

Demo: https://ideone.com/JAQGu0

If you can get the proper UTF-8 encoding of the character into a variable, removing the square brackets should work trivially with most sed implementations.

char=$(python3 -c 'print("\u200e")')
echo "be" | sed "s/$char//g"

Demo: https://ideone.com/TrvVJj

Tangentially, avoid upper case for your private shell variables.

tripleee
  • 175,061
  • 34
  • 275
  • 318
1

See the output for your python snippet to understand why:

$ python3 -c 'print ("\u200E".encode("utf8"))'
b'\xe2\x80\x8e'

You can use ANSI-C quoting if your shell supports it:

$ printf 'a\u200Eb\n' | cat -v
aM-bM-^@M-^Nb

$ printf 'a\u200Eb\n' | sed 's/'$'\u200E''//g' | cat -v
ab
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • I have not understood the answer. But I guess this is what happens when I blindly copy paste an accepted answer from stack overflow having more than 50 upvotes :) https://stackoverflow.com/questions/8562354/remove-unicode-characters-from-textfiles-sed-other-bash-shell-methods/8562661#8562661 – shantanuo Jul 08 '21 at 09:28
  • The `python 3` output includes characters like `b` and `'` as well, since it is byte output. I think `python 2` didn't have those characters. – Sundeep Jul 08 '21 at 10:06
  • ok. Got it. But ho do I remove that character from a text file? – shantanuo Jul 08 '21 at 11:02
  • I guess the shell do not support it. – shantanuo Jul 08 '21 at 12:40
  • `$'...'` is a Bash feature; if you are using a different shell, perhaps specify which precise shell you are using. – tripleee Jul 13 '21 at 13:41