5

I want to remove Unicode in some range, e.g.:

echo "abcABC123" | sed 's/[\uff21-\uff3b]//g'

expect "abc123", but get:

sed: -e expression #1, char 20: Invalid range end

or use:

echo "abcABC123" | sed 's/[A-Z]//g'

get:

sed: -e expression #1, char 14: Invalid collation character

Cyrus
  • 84,225
  • 14
  • 89
  • 153
user2524314
  • 135
  • 7

2 Answers2

4

Unicode support in sed is not well defined. You may be better off using command line perl:

echo "abcABC123" | perl -CS -pe 's/[\x{FF21}-\x{FF3B}]+//g'

abc123

It is important to use -CS flags here to be able to get correct UTF8 encodings for input/output/error.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Not sure why sed is not working, but you can use tr instead

$ echo 'abcABC123' | tr -d 'A-Z'
abc123


From man tr

tr - translate or delete characters

-d, --delete delete characters in SET1, do not translate

Sundeep
  • 23,246
  • 2
  • 28
  • 103