1

I have a script that scans Web pages to applies formatting to numbers. For example if it finds a number bigger than 50000 it applies a red background.

The script works fine in languages that use the dot as decimal separator (for example English 12,345.67 will be interpreted as 12345.67), but fails in languages that use the comma as decimal separator (French 1,234 will be interpreted as 1234 instead of 1.234).

My question: is there a way to detect the Web page locale, and interpret the number accordingly?

Christophe
  • 27,383
  • 28
  • 97
  • 140
  • 1
    Here's a SO anser that should help: http://stackoverflow.com/questions/5314237/javascript-convert-to-european-locale – ron tornambe Feb 07 '13 at 22:45
  • @rontornambe I saw this post, but if I understand correctly the answers assume that you already know the locale. – Christophe Feb 07 '13 at 22:50
  • 2
    Would this be helpful for you? http://stackoverflow.com/questions/1074660/with-a-browser-how-do-i-know-which-decimal-separator-does-the-client-use/1308446#1308446 – Chris Nielsen Feb 07 '13 at 22:56
  • @ChrisNielsen thanks! I'll give it a try. So my question seems to be a duplicate – Christophe Feb 07 '13 at 22:59
  • The linked answers have nothing to do with this question. It is not a dupe. This question asks how to determine which locale a static document is in. An OS, browser, or any other client-side runtime locale is entirely unrelated. See @RobG's answer – Matt Apr 28 '23 at 12:35

3 Answers3

1

The formatting of number values in web pages is not based on settings of the client system but by the author of the page, likely based on the language of the intended or expected audience.

Browsers will not reformat numbers in text in a page based on system settings. With any luck, there might be a lang attribute indicating the language of the content, but I don't think that will be reliable or even widely implemented.

It may be possible to gather all the numbers in the page and guess the separator based on things like:

  1. If any number matches /^[+-]?0?\.\d/ then the decimal separator is a period and thousands separator is a comma
  2. If any number matches /^[+-]?0?,\d/ then the decimal separator is a comma and thousands separator is a period
  3. If any number matches /\.\d\d\d\./ then the thousands separator is a period and decimal separator is a comma
  4. If any number matches /,\d\d\d,/ then the thousands separator is a comma and decimal separator is a period

and so on.

RobG
  • 142,382
  • 31
  • 172
  • 209
  • A good start. Other locales exist, e.g. Canadian, Danish, and Finnish locales use the space as the thousands separator. Your regexes could be modified (at least according to Oracle's documentation here: https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html – Matt Apr 28 '23 at 12:31
0

you can take a shot on getting html lang property, but there are still many webpages missing that attribute

pwolaq
  • 6,343
  • 19
  • 45
  • An element's `lang` attribute has no effect on the display of numbers in text in the page. e.g. `1,234` will not be displayed as `1.234`. – RobG Feb 08 '13 at 00:00
0

You can use JavaScript to determine if the locale uses commas or dots as the decimal separator as such:

function getDecimalSeparator() {
  return (0.1).toLocaleString().substring(1, 2)
}

getDecimalSeparator(); // "." on a US locale machine.
getDecimalSeparator(); // "," on a FR locale machine.
maerics
  • 151,642
  • 46
  • 269
  • 291
  • 1
    There are many "locales" other than US that use a comma separator for thousands, and also many other than France that use a period. The formatting used has nothing to do with locality and much more to do with the preferred language of the user. – RobG Feb 07 '13 at 23:50
  • @RobG: yes, I was just using US and FR as examples of locations whose official language use those separators. – maerics Feb 07 '13 at 23:51
  • Maerics—I guess more to the point is that a client's system settings don't tell you anything about the text in a page that came from a server. E.g. `1,234` will not be changed to `1.234` based on system user preferences. – RobG Feb 08 '13 at 00:27
  • @RobG: ah yes, I see what you're saying. A person using a US locale browser reading a French language website would get the wrong results! – maerics Feb 08 '13 at 01:02