0

How do I replace Unicode hex digits with blanks? While scraping a website, I've found character strings that print as blanks, but are not blanks. For example:

print(str)

prints

3 Max. 11

but

print(charToRaw(str))

prints

33 c2 a0 4d 61 78 2e 20 31 31

How can I replace the hex digits 0xc2a0 with a single blank (" ")?

I have tried

library(stringr)
str_replace_all(str, "[^[:alnum:]]", " ")  

But that also replaces the period

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
skb
  • 1
  • 1
  • 2
    Please provide [reproducible sample data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Maurits Evers Sep 24 '18 at 02:53
  • c2 a0 is the UTF-8 encoding of U+00A0, NO-BREAK SPACE. You'll be better off using a unicode-aware string function to replace that character with a normal space than dealing with raw UTF-8 bytes. – Shawn Sep 24 '18 at 04:51
  • 1
    A quick search for 'R string functions' suggests that something like `gsub("\u00a0", " ", str)` might do the trick. – Shawn Sep 24 '18 at 04:55

1 Answers1

0

Shawn's suggestion works perfectly - thank you Shawn. Refer to his comments above.

The answer is

gsub("\u00a0", " ", str) 
skb
  • 1
  • 1