0

How do I apply a function to every element in a list in every row of a dataframe?

df:

label   top_topics               
adverts ['werbung', 'geschenke']

my function looks something like this:

from langdetect import detect
from googletrans import Translator

def detect_and_translate(text):
    
    target_lang = 'en'
    try:
        result_lang = detect(text)
        
    except:
        result_lang = target_lang
    
    if result_lang == target_lang:
        
        return text, result_lang
    
    else:
        translator = Translator()
        translated_text = translator.translate(text, dest=target_lang)
        return translated_text.text, result_lang

expecting an output like :

 label        top_topics                 translation             language

 adverts    ['werbung', 'geschenke']       ['advertising', 'gifts']   de

I tried something like this but didn't translate the column top_topics as it couldn't loop through every element in the list.

df['translate_detect'] = df['top_topics'].apply(detect_and_translate)
df['top_topics_en'], df['language'] = df.translate_detect.str

Any help?

Jazz
  • 445
  • 2
  • 7
  • 22
  • Please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – ddejohn May 17 '22 at 16:12
  • I did provide. Please check properly. @ddejohn – Jazz May 17 '22 at 16:14
  • Please read the linked article. Your example data are not copy-paste-able. Please also see how to provide a [sample dataset](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – ddejohn May 17 '22 at 16:16

1 Answers1

1

First, you should never use a bare except.

Second, because your function translates a single word and returns the translated word and the detected language as a tuple, it would be difficult and tedious to achieve your desired output of a list of translated words and a single detected language. Instead, modify your function to do so:

import googletrans


def detect_and_translate(lst):
    translator = Translator()
    target_lang = 'en'
    try:
        result_lang = translator.detect(lst[0])
    except Exception:  # should be the specific exception that can occur
        return lst, result_lang

    translations = []
    for text in lst:
        translated_text = translator.translate(text, dest=target_lang)
        translations.append(translated_text.text)

    return translations, result_lang

Usage:

In [4]: googletrans.__version__
Out[4]: '4.0.0-rc.1'

In [5]: df[["topics_en", "language"]] = df.top_topics.apply(detect_and_translate).apply(pd.Series)

In [6]: df
Out[6]:
     label            top_topics             topics_en                            language
0  adverts  [werbung, geschenke]  [advertising, gifts]  Detected(lang=de, confidence=None)

Note that googletrans.Translator has a language detection method. It doesn't work in 3.0.0 but if you pip install googletrans==4.0.0rc1 it will.

Note also that in order for this to work, you must assume that all words in a given list are the same language. If that's not an assumption you can make, you'll need to figure something else out.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • Hi Thanks ! Your solution partly works, as I need the output in 2 different columns, with this I get tuples in a list. – Jazz May 17 '22 at 16:12
  • The problem here is that your function is not designed to provide the output you want. Your function takes one word at a time and returns the tuple `(translated_word, original_language)`, **for each word**. You will need to spend some time thinking about how to go about doing that, since it's not as simple as translating all the words and then returning `list_of_translated_words, original_language` since it may be possible that you get a list of words from multiple languages (e.g., `["pompelmo", "pamplemousse"]` -- it would be incorrect to return `["grapefruit", "grapefruit"], IT`). – ddejohn May 17 '22 at 16:24
  • If you're okay to make the assumption that all words in a given list will be the same language, see my edit. – ddejohn May 17 '22 at 16:47
  • perfect works well now! :) Thanks, I wanted to actually use the same function for both lists and strings, but I have used your implementation and handled it with some if-else to check the type of input and have my final function! :) – Jazz May 17 '22 at 19:47