0

I have an html file generated by Google with the following headings,

<!doctype html><html><head><title>ddd</title><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width,initial-scale=1,minimum-scale=1,maximum-scale=2">

and use the following pattern to match the text that contains unicode (chinese and special characters).

$pattern_Title = '/class=\"text1t\">[\’\w\s\:\d]+/u';

I know that I can use "u" to enable uniform matching in PHP for UTF-8 compatible documents.However, though it is UTF-8 document, there is something wrong here. When I run PhP code and parse the online HTML page (without saving contents in my computer), It does not match anything due to the "u" letter. When I remove the "u", the code works fine but fails to match Chinese characters. I then copied the HTML contents and stored them inside a string variable into my PHP code and saved the file. Then I run the code with "u" and it works just fine.

So, I have no idea how to fix the problem. There is a post in stackoverflow about converting non utf-8 to utf-8 in PhP, I used it but no difference at all. The HTML code is generated by Google.

Any idea? Thanks in advance.

Espanta
  • 1,080
  • 1
  • 17
  • 27

0 Answers0