Convert ISO 639-1 to ISO 639-2

Question

I need to take an ISO 639-1 code such as en-GB and convert it into an ISO 639-2 code such as eng.

I have looked at the following libraries, but did not find a documented means to perform that transformation in any of them:

Have I missed something? That is - is this possible with any of these libraries?

wkl · Accepted Answer · 2015-12-16T15:58:12.883

6

You can use pycountry for what you want. Do note that if you want the reverse scenario (ISO 639-2 to ISO 639-1) it may not always work because while there should always be a mapping from an ISO 639-1 language code to ISO 639-2, the reverse is not guaranteed.

import pycountry

code = 'en-GB'

# ISO 639-1 codes are always 2-letter codes, so you have to take
# the first two characters of the code

# This is a safer way to extract the country code from something
# like en-GB (thanks ivan_pozdeev)
lang_code = code[:code.index('-')] if '-' in code else code

lang = pycountry.languages.get(iso639_1_code=lang_code)
print("ISO 639-1 code: " + lang.iso639_1_code)
print("ISO 639-2 code: " + lang.iso639_2T_code)
print("ISO 639-3 code: " + lang.iso639_3_code)

The above should print out:

ISO 639-1 code: en
ISO 639-2 code: eng
ISO 639-3 code: eng

edited Dec 16 '15 at 15:58

answered Dec 16 '15 at 02:16

wkl

77,184
16
165
176

Thanks, this was v.helpful. The [Wikipedia article I linked](https://en.wikipedia.org/w/index.php?title=Language_localisation&oldid=685931772#Language_tags_and_codes) had confused me: even though its wording talks of the ISO 639-1 "alpha-2 code" and "two-letter codes", the table shows the string `en-GB` as an example of an "ISO 639-1 code", which left me thinking that the whole string was a valid ISO 639-1 code, not just the first two letters of the string. That in turn is why I thought the libraries I had listed might not be able to perform the transformation I needed. Now off to read BCP 47! – Dec 16 '15 at 12:57
`code[:code.index('-')]` appears to be more reliable to extract the first part (so the code will choke if it's not 2-letter). It won't work if there's only one part already. Something like `code[:code.index('-')] if '-' in code else code` will work in all cases. – ivan_pozdeev Dec 16 '15 at 14:56
@ivan_pozdeev - Thanks for that, I've updated my code to include that line. – wkl Dec 16 '15 at 15:48
it's good, but also 9.2Mb! So I used the csv mentioned by @Ashwini_Chaudhary (https://stackoverflow.com/a/16253118/1937033) with only 9.6Kb: geohack.net/gis/wikipedia-iso-country-codes.csv – ThePhi Nov 12 '17 at 08:39
why not `code.split('-')[0]` ? – Pedro Lobito Jul 31 '23 at 09:24

score 1 · Answer 2 · edited May 23 '17 at 12:09

1

List of ISO 639-2 codes at Wikipedia has a table specifying the correspondence. Since it's not a 1-1 mapping, the conversion is not always possible.

You did miss something - it's quite possible to do the conversion with the libraries you specified.

BabelFish — babelfish 0.5.1 documentation:

Built-in language converters (alpha2, alpha3b, alpha3t, name, scope, type and opensubtitles):
>>> language = babelfish.Language('por', 'BR')
>>> language.alpha2
'pt'
<...>
>>> babelfish.Language.fromalpha3b('fre')
<Language [fr]>

langcodes is tailored for different tasks - recognizing and matching languages regardless of standards. So you can extract all the codes that are related to your initial one - to varying extents - but it will not tell you which standards they pertain to.
pycountry is similar to babelfish and is covered by the other answer.

edited May 23 '17 at 12:09

Community

1
1

answered Dec 16 '15 at 01:41

ivan_pozdeev

33,874
19
107
152

Thanks! Yes, I did indeed miss something. My comment [here](https://stackoverflow.com/questions/34302586/convert-iso-639-1-to-iso-639-2#comment56366849_34302890) explains what it was. – Dec 16 '15 at 14:15
@sampablokuper Well, I did miss _that part of your confusion_ - because I never had it in the first place (the Wikipedia table lists two-letter codes - two-letter codes it is!). – ivan_pozdeev Dec 16 '15 at 14:57
ivan_pozdeev, right! Depends on which Wikipedia table one consults, though :) Sorry my confusion was confusing ;) – Dec 16 '15 at 16:31

Convert ISO 639-1 to ISO 639-2

2 Answers2