44

The accept-language header in request is usually a long complex string -

Eg.

Accept-Language : en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2

Is there a simple way to parse it in java? Or a API to help me do that?

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Pushkar
  • 7,450
  • 10
  • 38
  • 57
  • 2
    It is not really that complicated: you split the part after the colon by commas, then look for a semicolon in each group, then parse the language codes and q factors. – Karl Knechtel Jul 26 '11 at 01:18
  • And the language codes tend to correspond to `java.util.Locale`s after you replace the `'-'`s with `'_'`s. – Mike Samuel Jul 26 '11 at 01:44
  • 4
    Do you really need to parse it yourself, or can you use [Http]ServletRequest.getLocale[s] and let the container handle the complexity? – Brett Kail Jul 26 '11 at 05:04
  • @bkail : please put your comment in an answer, since it is 'right' – vkraemer Jul 27 '11 at 00:29
  • Sure. It wasn't obvious whether this was a servlet question or not, though I guess the presence of java-ee tag suggests the OP might be satisfied using a servlet API. – Brett Kail Jul 27 '11 at 15:18
  • Actually the best answer is the last one: `Locale.forLanguageTag(locale)`. – daemon_nio Apr 05 '23 at 09:07

7 Answers7

50

I would suggest using ServletRequest.getLocales() to let the container parse Accept-Language rather than trying to manage the complexity yourself.

Chloe
  • 25,162
  • 40
  • 190
  • 357
Brett Kail
  • 33,593
  • 2
  • 85
  • 90
  • 4
    Unless you're planning to directly support every possible locale, ServerRequest.getLocales is probably a better choice. – Jeremy List Sep 04 '13 at 13:02
  • 2
    The problem is `ServletRequest.getLocales` returns the server locale if the user does not provides a valid one. To prevent language spam requests you must parse it yourself where `LanguageRange.parse(String)` is convenient. – djmj Dec 07 '16 at 04:02
  • 3
    @djmj It's easy enough to just check for the existence of the `Accept-Language` header when relevant. You're right, though, that newer JDKs have added additional APIs that could be useful (this answer is from 2011!). – Brett Kail Dec 07 '16 at 04:05
  • If bots abuse the `Accept-Language` to spam your website it exists but without a valid element. Absent can still be valid in case some google bots crawl your page. See http://webmasters.stackexchange.com/questions/101473/why-is-a-message-showing-in-google-analytics-language-column – djmj Dec 07 '16 at 04:07
43

For the record, now it is possible with Java 8:

Locale.LanguageRange.parse()
madhead
  • 31,729
  • 16
  • 153
  • 201
Qiang Li
  • 1,099
  • 11
  • 8
  • 2
    And if you want the list of locales, you can use `Locale.LanguageRange.parse(requestedLangs) .stream().sorted(Comparator.comparing(Locale.LanguageRange::getWeight).reversed()).map(range -> new Locale(range.getRange())).collect(Collectors.toList());` – Alex Jan 31 '16 at 12:03
  • 4
    @Alex: According to the javadoc, you don't need to sort the returned `List`: "Unlike a weighted list, language ranges in a prioritized list are sorted in the descending order based on its priority. The first language range has the highest priority and meets the user's preference most.". So your code could simply be: `Locale.LanguageRange.parse(requestedLangs).stream().map(range -> new Locale(range.getRange())).collect(Collectors.toList());` – FBB Nov 16 '16 at 20:39
  • 3
    This doesn't return a properly parsed locale, .e.g. "en-GB" will get parsed to a language called "en-gb" with no country. – Daniel Flower Feb 25 '18 at 08:45
  • The `Locale.filter` methods should be used to convert a `LanguageRange` to a set of matching `Locale` – SpaceTrucker Nov 14 '18 at 20:45
16

Here's an alternative way to parse the Accept-Language header which doesn't require a servlet container:

String header = "en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2";
for (String str : header.split(",")){
    String[] arr = str.trim().replace("-", "_").split(";");

  //Parse the locale
    Locale locale = null;
    String[] l = arr[0].split("_");
    switch(l.length){
        case 2: locale = new Locale(l[0], l[1]); break;
        case 3: locale = new Locale(l[0], l[1], l[2]); break;
        default: locale = new Locale(l[0]); break;
    }

  //Parse the q-value
    Double q = 1.0D;
    for (String s : arr){
        s = s.trim();
        if (s.startsWith("q=")){
            q = Double.parseDouble(s.substring(2).trim());
            break;
        }
    }

  //Print the Locale and associated q-value
    System.out.println(q + " - " + arr[0] + "\t " + locale.getDisplayLanguage());
}

You can find an explanation of the Accept-Language header and associated q-values here:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Many thanks to Karl Knechtel and Mike Samuel. Thier comments to the original question helped point me in the right direction.

Peter
  • 1,182
  • 2
  • 12
  • 23
5

We are using Spring boot and Java 8. This works

In ApplicationConfig.java write this

@Bean

public LocaleResolver localeResolver() {
    return new SmartLocaleResolver();
}

and I have this list in my constants class that has languages that we support

List<Locale> locales = Arrays.asList(new Locale("en"),
                                         new Locale("es"),
                                         new Locale("fr"),
                                         new Locale("es", "MX"),
                                         new Locale("zh"),
                                         new Locale("ja"));

and write the logic in the below class.

public class SmartLocaleResolver extends AcceptHeaderLocaleResolver {
          @Override
         public Locale resolveLocale(HttpServletRequest request) {
            if (StringUtils.isBlank(request.getHeader("Accept-Language"))) {
            return Locale.getDefault();
            }
            List<Locale.LanguageRange> ranges = Locale.LanguageRange.parse("da,es-MX;q=0.8");
            Locale locale = Locale.lookup(ranges, locales);
            return locale ;
        }
}
Arun
  • 2,312
  • 5
  • 24
  • 33
3

ServletRequest.getLocale() is certainly the best option if it is available and not overwritten as some frameworks do.

For all other cases Java 8 offers Locale.LanguageRange.parse() as previously mentioned by Quiang Li. This however only gives back a Language String, not a Locale. To parse the language strings you can use Locale.forLanguageTag() (available since Java 7):

    final List<Locale> acceptedLocales = new ArrayList<>();
    final String userLocale = request.getHeader("Accept-Language");
    if (userLocale != null) {
        final List<LanguageRange> ranges = Locale.LanguageRange.parse(userLocale);

        if (ranges != null) {
            ranges.forEach(languageRange -> {
                final String localeString = languageRange.getRange();
                final Locale locale = Locale.forLanguageTag(localeString);
                acceptedLocales.add(locale);
            });
        }
    }
    return acceptedLocales;
tec
  • 505
  • 3
  • 13
  • This still allows for a nonsene Locale instance like `"test"` that are used by spam requests since `LanguageRange.parse` only checks for synax and not IANA language rules. You need to check the locale against valid locales like `Locale.getAvailableLocales()` to be sure it is valid. – djmj Dec 20 '16 at 01:32
2

The above solutions lack some kind of validation. Using ServletRequest.getLocale() returns the server locale if the user does not provides a valid one.

Our websites lately received spam requests with various Accept-Language heades like:

  1. secret.google.com
  2. o-o-8-o-o.com search shell is much better than google!
  3. Google officially recommends o-o-8-o-o.com search shell!
  4. Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO

This implementation can optional check against a supported list of valid Locale. Without this check a simple request with "test" or (2, 3, 4) still bypass the syntax-only validation of LanguageRange.parse(String).

It optional allows empty and null values to allow search engine crawler.

Servlet Filter

final String headerAcceptLanguage = request.getHeader("Accept-Language");

// check valid
if (!HttpHeaderUtils.isHeaderAcceptLanguageValid(headerAcceptLanguage, true, Locale.getAvailableLocales()))
    return;

Utility

/**
 * Checks if the given accept-language request header can be parsed.<br>
 * <br>
 * Optional the parsed LanguageRange's can be checked against the provided
 * <code>locales</code> so that at least one locale must match.
 *
 * @see LanguageRange#parse(String)
 *
 * @param acceptLanguage
 * @param isBlankValid Set to <code>true</code> if blank values are also
 *            valid
 * @param locales Optional collection of valid Locale to validate any
 *            against.
 *
 * @return <code>true</code> if it can be parsed
 */
public static boolean isHeaderAcceptLanguageValid(final String acceptLanguage, final boolean isBlankValid,
    final Locale[] locales)
{
    // allow null or empty
    if (StringUtils.isBlank(acceptLanguage))
        return isBlankValid;

    try
    {
        // check syntax
        final List<LanguageRange> languageRanges = Locale.LanguageRange.parse(acceptLanguage);

        // wrong syntax
        if (languageRanges.isEmpty())
            return false;

        // no valid locale's to check against
        if (ArrayUtils.isEmpty(locales))
            return true;

        // check if any valid locale exists
        for (final LanguageRange languageRange : languageRanges)
        {
            final Locale locale = Locale.forLanguageTag(languageRange.getRange());

            // validate available locale
            if (ArrayUtils.contains(locales, locale))
                return true;
        }

        return false;
    }
    catch (final Exception e)
    {
        return false;
    }
}
djmj
  • 5,579
  • 5
  • 54
  • 92
1
Locale.forLanguageTag("en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2")
mokshino
  • 1,435
  • 16
  • 11