4

I have the following Pandas series:

>>> df.original_language.value_counts()
en    32269
fr     2438
it     1529
ja     1350
de     1080
      ...  
la        1
jv        1
sm        1
gl        1
mt        1
Name: original_language, Length: 92, dtype: int64
4

I want to convert these language codes into their original names, for example

en >> English

ar >> Arabic

I looked up this question but it didn't help. If there are any packages required, please provide a source of how to install them using pip if possible.

Fatimah E.
  • 101
  • 1
  • 5

1 Answers1

5

Use iso-639 module ->

#pip install iso-639
from iso639 import languages
df['lang'] = df['lang'].apply(lambda x: languages.get(alpha2=x).name)

output -

       lang  count
0   English  32269
1    French   2438
2   Italian   1529
3  Japanese   1350
4    German   1080
5     Latin      1
6  Javanese      1
7    Samoan      1
8  Galician      1
9   Maltese      1

If you wanna convert codes in your original df, then use -

from iso639 import languages
df['original_language'] = df['original_language'].apply(lambda x: languages.get(alpha2=x).name)
Nk03
  • 14,699
  • 2
  • 8
  • 22
  • 2
    Thank you! The main problem for me was at installing, but turns out there are 2 modules with the same name, one with a hyphen and the other without. So this question helped me too solve the problem: https://stackoverflow.com/questions/58464166/importerror-cannot-import-name-languages – Fatimah E. May 01 '21 at 18:13
  • There's also iso639 package, is that something else? I'm using poetry and for some reason it couldn't find package iso-639 (even though pip could find it) – Alon Samuel Dec 15 '22 at 12:34