0

I want to replace all occurrences of non-ASCII chars in Unix to space but group of all the characters should to converted to a single space like :

CHAVEZ MONTA�O   should be converted to CHAVEZ MONTAO<followed by one space>

How can I do so. I used below Perl command :

grep --color='auto' -P -n "[\x80-\xFF]" file.xml

But this is converting one single char in one space this is not what I want.

EDIT1:

I know CHAVEZ MONTA�O converted to CHAVEZ MONTA O makes more sense. But would be better if it is

CHAVEZ MONTAO<followed by one space>

But please suggest for CHAVEZ MONTA O too.

mpapec
  • 50,217
  • 8
  • 67
  • 127
Mayank Jain
  • 2,504
  • 9
  • 33
  • 52

3 Answers3

1

Seems like you want something like this,

$ echo 'CHAVEZ MONTA�O' | perl -pe 's/[^[:ascii:]]+/ /g'
CHAVEZ MONTA O

$ echo 'CHAVEZ MONTA�O' | perl -pe 's/([^[:ascii:]]+)(.)/\2 /g'
CHAVEZ MONTAO 

$ echo 'CHAVEZ MONTA�O' | perl -pe 's/�/ /g'
CHAVEZ MONTA O

$ echo 'CHAVEZ MONTA�O' | perl -pe 's/�([[:ascii:]])/\1 /g'
CHAVEZ MONTAO 
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Pure Bash:

shopt -s extglob
var="CHAVEZ MONTA�O"
echo "${var//+([^[:ascii:]])/ }"
gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
0

If you use this sed, you can do it like this:

sed -s 's/[\d128-\d255]\+/ /' < INPUTFILE

It replaces all chars > 128 with one whitespace.

Biber
  • 709
  • 6
  • 19