0

I would know how to get data from a website I find a tutorial and finished with this

import os
import csv
import requests
from bs4 import BeautifulSoup

requete = requests.get("https://www.palabrasaleatorias.com/mots-aleatoires.php")
page = requete.content
soup = BeautifulSoup(page)

The tutorial say me that I should use something like this to get the string of a tag

h1 = soup.find("h1", {"class": "ico-after ico-tutorials"})
print(h1.string)

But I got a problem : the tag where I want to get text content haven't class... how should I do ?

I tried to put {} but not working this too {"class": ""} In fact, it's return me a None I want to get the text content of this part of the website :

<div style="font-size:3em; color:#6200C5;">
Orchard</div>

Where Orchard is the random word Thank for any type of help

1 Answers1

0

Unfortunately, there aren't many pointers featured in BeautifulSoup, and the page you are trying to get is terribly ill-suited for your task (no IDs, classes, or other useful html features to point at).

Hence, you should change the way you use to point at the html element, and use the Xpath - and you can't do it with BeautifulSoup. In order to do that, just use html from package lxml to parse the page. Below a code snippet (based on the answers to this question) which extracts the random word in your example.

import requests
from lxml import html

requete = requests.get("https://www.palabrasaleatorias.com/mots-aleatoires.php")
tree = html.fromstring(requete.content)
rand_w = tree.xpath('/html/body/center/center/table[1]/tr/td/div/text()')
print(rand_w)
cap.py
  • 249
  • 1
  • 8