0

I have a large data set that was converted from JSON data. It has unicode characters using the JSON-compliant \uXXXX notation. I'm trying to convert these to true unicode on the fly using preg_replace.

preg_replace('/\\u([a-z0-9]+)/i', "\x{${1}}", $str);

However, this generates a warning:

PHP Warning: preg_replace(): Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

Why is preg_replace complaining when I'm searching for the actual slash-u and not a unicode literal?

Edit:

Here's what I'm working with: https://regex101.com/r/LIdECa/1

Nilpo
  • 4,675
  • 1
  • 25
  • 39
  • Are you trying to decode JSON manually? – Álvaro González Jun 23 '17 at 10:56
  • @ÁlvaroGonzález The JSON data was processed and migrated into a MySQL database. These "unicode" characters remain in the text. – Nilpo Jun 23 '17 at 10:58
  • See https://stackoverflow.com/questions/6058394/unicode-character-in-php-string, And to match a ``\`` with a regex, you need 4 backslashes. You need a preg_replace_callback, by the way, to process the match. See [**this PHP demo**](https://ideone.com/rIOC5j). – Wiktor Stribiżew Jun 23 '17 at 10:59
  • So there's a prior error, either in encoding or in decoding. In any case, I'm not sure you can generate actual characters that way because `\x` is a string literal. You'd need to verify that first. – Álvaro González Jun 23 '17 at 11:01
  • @ÁlvaroGonzález From what I've read, you can generate actual characters this way in PHP 7. I've never tried it before so I don't know. Getting the regex fixed was the first step in trying to make it work. – Nilpo Jun 23 '17 at 11:07
  • @Nilpo: A hint: you *can't* use `preg_replace` to do it unless you are not using PHP 7 and want to risk using `/e` modifier. – Wiktor Stribiżew Jun 23 '17 at 11:32
  • I'm removing my answer because it was actually not solving your problem, it included a non-working alternative and I've found a duplicate question with better solutions. – Álvaro González Jun 23 '17 at 12:04
  • @ÁlvaroGonzález I was able to get json_decode working as you suggested. I've also implemented Wiktor Stribiżew's solutions as well. – Nilpo Jun 23 '17 at 12:12
  • @WiktorStribiżew Please post your solutions as an answer. – Nilpo Jun 23 '17 at 12:12
  • @Nilpo I don't think it's robust enough. If data is half-decoded (e.g. double quotes do not have the leading backslash) it'll break horribly. I suggest you use the accepted answer in the linked question. – Álvaro González Jun 23 '17 at 12:14
  • @ÁlvaroGonzález I was using that solution originally and it worked perfectly on my dev box, but not on the live server. PHP wasn't compiled with mb support. – Nilpo Jun 23 '17 at 12:16

0 Answers0