0

I have a file that includes temperatures along with a degree symbol that I want to remove. It looks like this in Notepad++:

40238230,194°,47136

The symbol does not print with a plain cat:

40238230,194,47136

But cat -e shows M-0 where the symbol is:

40238230,194M-0,47136

How can I get rid of that symbol? I thought the following sed would do it (by including only digits and commas), but doesn't:

sed -r 's/[^0-9\,]//g'
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
ebodin
  • 11
  • 3
  • What `locale` are you using in your terminal? `echo 40238230,194°,47136| sed 's/°//' 40238230,194,47136` – tink Jan 09 '19 at 22:11
  • My console is UTF-8. The script will end up running on other machines. The answer I flagged should be independent of the locale setting. – ebodin Jan 10 '19 at 16:23

1 Answers1

0

Could it be that you have not setup up your console to use Unicode?

The degree sign is Unicode &#x00B0. In UTF-8 this is \xc2\xb0. So if you console is not using Unicode you will have to replace those two bytes.

The M- notation is described here: What is the "M- notation" and where is it documented?.

M-0 is 0xb0

On a console with Unicode enabled I get:

$ cat foo
122 °C
$ cat -e foo
122 M-BM-0C$

Now for removing with sed read: Remove unicode characters from textfiles - sed , other bash/shell methods

Xypron
  • 2,215
  • 1
  • 12
  • 24
  • Thanks for the link to the unicode characters thread. I tried sed a couple of ways but opted for what seems simplest and broadest (my input is from other sources and could have surprising codes in the future): `iconv -c -t ascii` – ebodin Jan 10 '19 at 16:16