4

I need to take an ISO 639-1 code such as en-GB and convert it into an ISO 639-2 code such as eng.

I have looked at the following libraries, but did not find a documented means to perform that transformation in any of them:

Have I missed something? That is - is this possible with any of these libraries?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152

2 Answers2

6

You can use pycountry for what you want. Do note that if you want the reverse scenario (ISO 639-2 to ISO 639-1) it may not always work because while there should always be a mapping from an ISO 639-1 language code to ISO 639-2, the reverse is not guaranteed.

import pycountry

code = 'en-GB'

# ISO 639-1 codes are always 2-letter codes, so you have to take
# the first two characters of the code

# This is a safer way to extract the country code from something
# like en-GB (thanks ivan_pozdeev)
lang_code = code[:code.index('-')] if '-' in code else code

lang = pycountry.languages.get(iso639_1_code=lang_code)
print("ISO 639-1 code: " + lang.iso639_1_code)
print("ISO 639-2 code: " + lang.iso639_2T_code)
print("ISO 639-3 code: " + lang.iso639_3_code)

The above should print out:

ISO 639-1 code: en
ISO 639-2 code: eng
ISO 639-3 code: eng
wkl
  • 77,184
  • 16
  • 165
  • 176
  • Thanks, this was v.helpful. The [Wikipedia article I linked](https://en.wikipedia.org/w/index.php?title=Language_localisation&oldid=685931772#Language_tags_and_codes) had confused me: even though its wording talks of the ISO 639-1 "alpha-2 code" and "two-letter codes", the table shows the string `en-GB` as an example of an "ISO 639-1 code", which left me thinking that the whole string was a valid ISO 639-1 code, not just the first two letters of the string. That in turn is why I thought the libraries I had listed might not be able to perform the transformation I needed. Now off to read BCP 47! –  Dec 16 '15 at 12:57
  • `code[:code.index('-')]` appears to be more reliable to extract the first part (so the code will choke if it's not 2-letter). It won't work if there's only one part already. Something like `code[:code.index('-')] if '-' in code else code` will work in all cases. – ivan_pozdeev Dec 16 '15 at 14:56
  • @ivan_pozdeev - Thanks for that, I've updated my code to include that line. – wkl Dec 16 '15 at 15:48
  • it's good, but also 9.2Mb! So I used the csv mentioned by @Ashwini_Chaudhary (https://stackoverflow.com/a/16253118/1937033) with only 9.6Kb: geohack.net/gis/wikipedia-iso-country-codes.csv – ThePhi Nov 12 '17 at 08:39
  • why not `code.split('-')[0]` ? – Pedro Lobito Jul 31 '23 at 09:24
1

List of ISO 639-2 codes at Wikipedia has a table specifying the correspondence. Since it's not a 1-1 mapping, the conversion is not always possible.

You did miss something - it's quite possible to do the conversion with the libraries you specified.

Built-in language converters (alpha2, alpha3b, alpha3t, name, scope, type and opensubtitles):

>>> language = babelfish.Language('por', 'BR')
>>> language.alpha2
'pt'
<...>
>>> babelfish.Language.fromalpha3b('fre')
<Language [fr]>
  • langcodes is tailored for different tasks - recognizing and matching languages regardless of standards. So you can extract all the codes that are related to your initial one - to varying extents - but it will not tell you which standards they pertain to.

  • pycountry is similar to babelfish and is covered by the other answer.

Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Thanks! Yes, I did indeed miss something. My comment [here](https://stackoverflow.com/questions/34302586/convert-iso-639-1-to-iso-639-2#comment56366849_34302890) explains what it was. –  Dec 16 '15 at 14:15
  • @sampablokuper Well, I did miss _that part of your confusion_ - because I never had it in the first place (the Wikipedia table lists two-letter codes - two-letter codes it is!). – ivan_pozdeev Dec 16 '15 at 14:57
  • ivan_pozdeev, right! Depends on which Wikipedia table one consults, though :) Sorry my confusion was confusing ;) –  Dec 16 '15 at 16:31