2

I'm trying to match the sentence "ça vous dit quoi" with regex pattern:

$pattern=(\b".$value."\b)

The word boundaries work with anything except the French exclusive characters like the ç at the beginning of ça. I can solve the word boundary problem by changing the PHP locale thus:

setlocale(LC_ALL, 'fr_FR');

When I do this, it successfully matches the sentence, but all the French characters are then displayed as � so I get:

�a vous dit quoi

Kind of annoying. Solve one problem only to create another. I already have the html locale set to:

<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr" version="XHTML+RDFa 1.0" dir="ltr">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Not sure what else needs to be done to fix this? Surely French should display ok with all the locales set to French...?

EDIT:My server is showing UTF-8 as the default character set for both the local and master value through phpinfo.

EDIT:This question is not similar to the one suggested because the question is not similar at all. The solution may be the same but anyone searching in google for the kind of problem I had would not find that question, but they would find mine. I think people are starting to just mark questions as duplicates just for the sake of it.

This question is also similar to mine in the same way, since the answer is the same: regular expression for French characters But that would make all THREE questions duplicates.

Hasen
  • 11,710
  • 23
  • 77
  • 135
  • https://stackoverflow.com/questions/279170/utf-8-all-the-way-through may be of use. Go through it in its entirety. – Funk Forty Niner Aug 13 '17 at 14:50
  • That's a bit of a mess. Lots of different answers with all kinds of different suggestions. Which answer were you referring to? EDIT: The top answer is all about mysql and databases. Doesn't seem relevant to my question. – Hasen Aug 13 '17 at 14:54
  • It doesn't matter. Read through it all, despite the mysql stuff and try something. Might just be a file encoding issue. 2-3 mins. of reading isn't enough. Spend some time first then come back and tell us the results. – Funk Forty Niner Aug 13 '17 at 15:00
  • Seems like something you just randomly found that may or may not help, otherwise you'd be able to at least point me to which answer has the steps to solve this. All the answers suggest different things so I obviously can't follow them all, and much of it is nothing to do with my problem whatsoever. Thanks anyway. – Hasen Aug 13 '17 at 15:05
  • Not sure what you mean, my meta tag is already set to utf-8 as you can see above? – Hasen Aug 13 '17 at 15:16

1 Answers1

0

It seems like its a nightmare to fix the ?? display in the French locale, but I was able to fix this problem another way by modifying the regex pattern instead. By adding 'u' as a modifier in the patter it was able to detect the French character ç in ça and all works properly with no need to change the locale.

From this:

$pattern=(\b".$value."\b)

to this:

$pattern=(\b".$value."\b/u)
Hasen
  • 11,710
  • 23
  • 77
  • 135