1

Question

I am trying to match browsers set to Scandinavian languages based on HTTP header "Accept-Language".

My regex is:

^(nb|nn|no|sv|se|da|dk).*

My question is if this is sufficient, and if anyone know about any other odd scandinavian (but "valid") language codes or obscure browser bugs causing false positives?

Used for

The regex is used for displaying a english link in the top of the Norwegian web pages (which is the primary language and the root of the domain and sub-domains) that takes you to the English web pages (secondary language and folder under root) when the browser language is not Scandinavian. The link can be closed / "opted-out" with hash stored in JavaScript localStorage if the user don't want to see the link again. We decided not to use IP geo-location because of limited time to implement.

2 Answers2

0

That regular expression is enough if you are testing each item in accept-language individually.

If not individually, there are 2 problems:

  • One of the expected languages could not appear at the beginning of the header, but after.
  • Some of the expected languages abbreviations could appear as qualifier of a completely different language.
Mario Rossi
  • 7,651
  • 27
  • 37
  • I'm only testing the start of the string. How likely is it that the first problem will occur (where the first language in the string is not the preferred one)? According to [ISO-639-1] the abbreviations should not match any "official" language codes but as I had to add "se" and "dk" which can occur as a start-tag, it could be other language with similar sub-tags occuring first (http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) – system_failure Aug 31 '13 at 14:35
  • I've gone trough the browser web statistics though, but couldn't see any preferred lang-codes that is not Scandinavian that would match (other than "nb-de" with ~0.00002% visits). – system_failure Aug 31 '13 at 14:47
0

Depending on the language you are working in there may be code in place you can use to parse this easily, e.g. this post: Parse Accept-Language header in Java <-- Also provides a good code example

Further - are you sure you want to limit your regex to the start of the string, as several lanaguages can be provided (the first is intended to be "I prefer x but also accept the following") : http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4

Otherwise your regex should work fine based on the what you were asking and here is a list of all browser language codes: http://www.metamodpro.com/browser-language-codes

I would also - in your shoes, make the "switch to X language" link easy to find for all users until they had opted not to see it again. I would expect many people may have a preference set by default in their browser but find a site actually using it to be unexpected i.e. a user experience like:

I prefer english but don't know enough to change this setting and have never had a reason to before as so few sites make use of it.

Community
  • 1
  • 1
Matthew
  • 9,851
  • 4
  • 46
  • 77
  • I'm considering parsing the whole string as we can have scenarios where the browser/user not have a Scandinavian language as their preferred, but prefer it over English (e.g. Finns / Icelanders that speak/read Scandinavian languages good). But not sure if it is necessary as the link is very visible and easy to close / "opt-out" from. – system_failure Aug 31 '13 at 14:58
  • My advice would be keep it simple and then track how your users interact with it and what their accept-language header was in your logs. Will give you a really good picture of how most of your users want the site to act. – Matthew Aug 31 '13 at 15:09