I might do it like alecxe suggested, but I'd use the URL that loads the definition itself. For instance, searching for azul
:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('http://lema.rae.es/drae/srv/search?val=azul')
print driver.find_element_by_css_selector("body>div").text
The URL that appears in the question loads a page that then loads the definition's URL in an iframe
element. Loading the definition directly with the URL I show above saves some work and some complexity: the entire definition is contained in the first div
child of body
. Unfortunately, it does not remove the need for JavaScript.
Running the code above produces:
azul.
(Quizá alterac. del ár. hisp. lazawárd, este del ár. lāzaward, este del persa laǧvard o lažvard, y este del sánscr. rājāvarta, rizo del rey).
1. adj. Del color del cielo sin nubes. Es el quinto color del espectro solar. U. t. c. s.
2. m. El cielo, el espacio. U. m. en leng. poét.
3. m. Méx. Miembro del cuerpo de Policía.
~ de cobalto.
[... etc ...]
Note that I've not detected the need to use any wait mechanism to detect that the content of the page is ready. Looking at the page in a debugger a) I did not see any Ajax request and b) looking at the JavaScript and the page itself, it looks like what is served is an obfuscated page that the JavaScript then deobfuscates synchronously. So by the time driver.get
returns, the content should be ready to be used.