1

From https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F

We see this:

enter image description here

But in code, the translated text "Bonjour, comment allez-vous aujourd'hui?" doesn't appear in any place of the page's source and the frame's code looks like:

<textarea class="lmt__textarea lmt__target_textarea lmt__textarea_base_style" 
data-gramm_editor="false" tabindex="110" dl-test="translator-target-input" 
lang="fr-FR" style="height: 300px;"></textarea>

And no matter how I read the text or source through BeautifulSoup, the translation in that textarea frame just can't be extracted.

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')
bsoup = BeautifulSoup(response.content.decode('utf8'))

bsoup.find_all('textarea')

How to extract the translations from any part of the page from the https://www.deepl.com/translator?

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
alvas
  • 115,346
  • 109
  • 446
  • 738

4 Answers4

4

This comes from the result of an external API using JSON RPC on :

POST https://www2.deepl.com/jsonrpc

with some parameters such as the text to translate to and the target language.

An example in using :

import requests
import time

url = "https://www2.deepl.com/jsonrpc"
text = "Hello, how are you today?"

r = requests.post(
    url,
    json = {
        "jsonrpc":"2.0",
        "method": "LMT_handle_jobs",
        "params": {
            "jobs":[{
                "kind":"default",
                "raw_en_sentence": text,
                "raw_en_context_before":[],
                "raw_en_context_after":[],
                "preferred_num_beams":4,
                "quality":"fast"
            }],
            "lang":{
                "user_preferred_langs":["FR","EN"],
                "source_lang_user_selected":"auto",
                "target_lang":"FR"
            },
            "priority":-1,
            "commonJobParams":{},
            "timestamp": int(round(time.time() * 1000))
        },
        "id": 40890008
    }
)

print(r.json())

Try this on repl.it

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • thanks for your answer, but am wondering if there is an update on this code since it doesn't accept over 60 characters, and use the id only once and you should replace it. – chikabala May 01 '21 at 23:26
3

To extract text from textarea field, use .get_attribute('value').

Here I add the way Selenium waits for an element using WebDriverWait with the .visibility_of_element_located method.

But sometimes when an element is available (for this case), it doesn't guarantee that the text already exists, so add a loop until text != ''

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

#maybe you need browser executable path here
driver = webdriver.Chrome()
driver.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')

while True:
    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.lmt__side_container--target textarea')))
    if(element.get_attribute('value') != ''):
        time.sleep(1)
        text_target = element.get_attribute('value')
        break

print(text_target)
driver.quit()

Hope this helps.

frianH
  • 7,295
  • 6
  • 20
  • 45
  • Seems like the `element.get_attribute('value')` isn't getting the complete translation. E.g. if you try translating the input: `Delivery always on time, best price in the market and have been using them for over 4 years now. Highly recommend it!` the translation from `.get_attribute('value')` will contain `[...] ` – alvas Jun 07 '20 at 16:10
  • This is because in the above example the while loop terminates once the text in the output text field is not empty anymore. To avoid this, you could, for instance, check the output text repeatedly with a delay between successive checks and stop once the string did not change anymore between checks. – Spherical Cowboy Jun 29 '20 at 00:05
2

To extract the text Bonjour, comment allez-vous aujourd'hui ? you need to induce WebDriverWait for the visibility_of_element_located() and get_attribute("value"). You can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("value"):

    driver.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "textarea.lmt__textarea.lmt__target_textarea.lmt__textarea_base_style"))).get_attribute("value"))
    
  • Using XPATH and get_attribute("value"):

    driver.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//textarea[@class='lmt__textarea lmt__target_textarea lmt__textarea_base_style']"))).get_attribute("value"))
    
  • Console Output:

    Bonjour, comment allez-vous aujourd'hui ?
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

Alternative with pyperclip and another locator (the button to copy the text) :

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pyperclip

driver.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.lmt__target_toolbar__copy > button"))).click()
data = pyperclip.paste()
E.Wiest
  • 5,425
  • 2
  • 7
  • 12