-1

Using this it is possible to remove nbsp

str_replace_all(df$text, 'nbsp', '')

What kind of regex can someone use to remove all number with this command?

foc
  • 947
  • 1
  • 9
  • 26
  • 1
    Can you give us some example data? – Punintended Aug 11 '20 at 16:11
  • 1
    Maybe a duplicate of https://stackoverflow.com/questions/13590139/remove-numbers-from-alphanumeric-characters depending on the example you provide – David Weber Aug 11 '20 at 16:17
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 11 '20 at 16:28
  • Depending on the regex engine used you may just use `\D` which is the character class for everything but digits. Otherwise the regex could be `[^0123456789]` which matches anything not in the set. https://regexr.com/ – karo Aug 11 '20 at 16:48

1 Answers1

1

If by "nbsp" you're referring to a Non Breaking Space, then it should work by using explicit Unicode encoding.

The nbsp is encoded as 0x00A0 in Unicode, so on R you can express it as "\U00a0".

For example:

> "This is a strange\U00A0 character"
[1] "This is a strange  character"

enter image description here

This might be more clear with a different character:

> "This is a strange \U00A1 character"
[1] "This is a strange ¡ character"

enter image description here

And this can be removed as you would expect.

> str_remove("This is a strange \U00A1 character", "\U00A1")
[1] "This is a strange  character"
> str_remove("This is a strange\U00A0 character", "\U00A0")
[1] "This is a strange character"

This also works by providing the decimal notation:

str_remove("This is a strange\U00A0 character", intToUtf8(160))

Note, this works on my computer, but there might be variations with locale settings and fonts installed.

Alexlok
  • 2,999
  • 15
  • 20