I want to extract specific text from specific xPaths from this url
https://www.discogs.com/it/artist/148415-Total-Eclipse-4
but nothing is displayed in the console. I expect the following output instead:
Artista 1: Total Eclipse (4)
Testo elemento 1: Jungle Fever
Testo elemento 2: Come Together
My script works perfectly for local .html files, but it doesn't seem to work when I try to use URLs. Here's the script I'm using:
import requests
from lxml import html
# Ask the user to enter the Discogs URL
url = input("Enter the Discogs URL: ")
# Make an HTTP request to get the HTML content of the page
response = requests.get(url)
html_content = response.text
# Parse HTML using lxml
tree = html.fromstring(html_content)
# Use a generic XPath to capture as many "h1" and "a" elements as desired
elements = tree.xpath('//*[starts-with(local-name(), "div")][4]/div/div[1]/div[1]/div/h1 | //*[starts-with(local-name(), "div")][4]/div/div[2]/div[2]/div/table//*[starts-with(local-name(), "tr")][2]/td[5]/a | //*[starts-with(local-name(), "div")][4]/div/div[2]/div[2]/div/table//*[starts-with(local-name(), "tr")]/td[5]/a')
# Counters to keep track of "a" item artists and lyrics
artist_counter = 1
text_counter = 1
# Print the text of the "h1" and "a" elements found in sequence
for element in elements:
if element.tag == 'h1':
print(f"Artista {artist_counter}: {element.text.strip()}")
artist_counter += 1
else:
print(f"Testo elemento {text_counter}: {element.text.strip()}")
text_counter += 1