2

I am trying to remove invisible characters from string

see remove-zero-width-space-characters

iex> str = "\uFEFF<?xml>"
iex> String.replace(str, ~r/[\u200B\u200C\u200D\uFEFF]/, "")   
** (Regex.CompileError) PCRE does not support \L, \l, \N{name}, \U, or \u at position 1
    (elixir) lib/regex.ex:171: Regex.compile!/2
    (elixir) expanding macro: Kernel.sigil_r/2
    iex:44: (file)

error: PCRE does not support \L, \l, \N{name}, \U, or \u at position 1

how can I implement the above regex?

Note: When using a string instead regex it works, but for code efficiency I would like to use regex

iex(34)> String.replace(a, "\uFEFF", "")
"<?xml>"
revo
  • 47,783
  • 14
  • 74
  • 117
dina
  • 4,039
  • 6
  • 39
  • 67

1 Answers1

2

Since you are using PCRE to match invisible characters use \p{C} property. This includes all invisible characters. For your case the error is due to the notation being used. PCRE doesn't support \uXXXX but \x{XXXX} and u flag should be set.

/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u

in code:

iex(33)> str = "\uFEFF<?xml>"
iex(34)> String.replace(str, ~r/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u, "") 
"<?xml>"
dina
  • 4,039
  • 6
  • 39
  • 67
revo
  • 47,783
  • 14
  • 74
  • 117
  • 1
    I believe the `u` should be after slash: `r/[\x{200B}\x{200C}\x{200D}\x{FEFF}]/u`. If didn't work try `r/(*UTF8)[\x{200B}\x{200C}\x{200D}\x{FEFF}]/` – revo May 24 '18 at 12:11