0

I've noticed that PHP's str_word_count is behaving differently on the command line vs. on the web. The behavior seems to due to the default locale value in each environment. If I use setlocale to view the current locale (echo setlocale(LC_ALL, 0);), on the command line I get

C/en_US.UTF-8/C/C/C/C

Whereas on the web I get

C

So where do the various PHP SAPIs (cli, fpm, mod_php, etc.) get this default value from?

My presumption is that the weird looking C/en_US.UTF-8/C/C/C/C locale all six locale categories jammed up together, and that this is telling me LC_CTYPE=en_US.UTF-8 and the other categories are all set the C locale -- so putting my question another way: Why is does my php-fpm server have all its locale categories set to C by default, whereas my command line PHP has one locale set to en_US.UTF-8

Alana Storm
  • 164,128
  • 91
  • 395
  • 599

3 Answers3

2

I assume that with "on the web" you refer to Apache web server under Linux ...

In this case the reason for setlocale(LC_ALL, 0) returning C is as follows:

If you check /etc/apache2/envvars from your Apache server you will find the following lines:

## The locale used by some modules like mod_dav
export LANG=C
## Uncomment the following line to use the system default locale instead:
#. /etc/default/locale

As e result a phpinfo() will produce the following output:

Apache Environment with LANG=C

If you modify the lines in /etc/apache2/envvars as follows ...

## The locale used by some modules like mod_dav
# export LANG=C
## Uncomment the following line to use the system default locale instead:
. /etc/default/locale

and restart Apache. Then Apache will use the locale and language settings from your operating system(you can check them os settings with the locale command in a terminal). As a result from the change above a phpinfo() will produce for example the following output on a system with de_CH.UTF-8 locale defined:

Apache Environment vars with LANG=de_CH.UTF-8

But you may still end up having setlocale(LC_ALL, 0) returning C or at least some of the LC_ variables being set to C e.g LC_CTYPE=de_CH.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"

In order to ensure that PHP uses the locale settings from the OS you have to call setlocale(LC_ALL, ""). The manual of setlocale under https://www.php.net/manual/en/function.setlocale.php states the following:

// If locales is the empty string "", the locale names will be set from 
// the values of environment variables with the same names as the above
// categories, or from "LANG".

// On Windows, setlocale(LC_ALL, '') sets the locale names from the 
// system's regional/language settings (accessible via Control Panel). 

And finally your setlocale(LC_ALL, 0) returns the value(s) configured on your operating system e.g. de_CH.UTF-8 :-)

Marc Schmid
  • 1,028
  • 7
  • 13
1

You can set intl.default_locale in php.ini

Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • +1 for useful info -- but it doesn't seem to answer my question. The value is `intl.default_locale => no value => no value` on both my php command line, as well as my web server. Also -- I'm not 100% sure that `intl.default_locale` controls the same sort of things that setlocale does. – Alana Storm Dec 30 '20 at 05:54
  • _nod_ per this answer, the intl locale is a different thing from the old C locales and controls different things. https://stackoverflow.com/a/45828833/4668 – Alana Storm Dec 30 '20 at 05:58
0

It appears that PHP is picking these values up directly from the shell's environment, and that the value for LC_CTYPE will also check LANG's value

% cat test.php
<?php
echo setlocale(LC_ALL, 0),"\n";

% echo $LANG
en_US.UTF-8

% echo $LC_CTYPE

% php test.php
C/en_US.UTF-8/C/C/C/C

% LC_CTYPE=C php test.php
C

% LANG=C php test.php
C

% LANG=C LC_CTYPE=en_US.UTF-8 php test.php
C/en_US.UTF-8/C/C/C/C

So, MacOS appears to set LANG to en_US.UTF-8 for a user's personal shell. My presumption is when I'm starting my php-fpm instance via

brew services start php

that the -- LaunchAgent? -- is using a shell where LANG or LC_CTYPE is set to C -- presumably because only humans are interesting in seeing things formatted as UTF-8.

Alana Storm
  • 164,128
  • 91
  • 395
  • 599