AttributeError: 'Response' object has no attribute 'html'

Question

I am trying to program a script that downloads songs from YouTube, but when I want to get first result of the YouTube search I get this error:

AttributeError: 'Response' object has no attribute 'html

How can I solve this?

This is my code:

import os
import sys
import requests
from pytube import YouTube

# Check if a file name and a folder name were provided as command-line arguments
if len(sys.argv) != 3:
    print("Usage: python download_songs.py <file name> <folder name>")
    sys.exit(1)

# Get the file and folder names from the command-line arguments
file_name = sys.argv[1]
folder_name = sys.argv[2]

# Create the specified folder to save the downloaded songs
if not os.path.exists(folder_name):
    os.makedirs(folder_name)# Prompt the user for the folder name
    

# Read the list of song titles from the input file
with open(file_name) as f:
    song_titles = [line.strip() for line in f]
# Download each song from YouTube and save it to the specified folder
for song_title in song_titles:
    url = "https://www.youtube.com/results?search_query=" + song_title + "extended mix"
    
# Send the request and get the response
    response = requests.get(url)

# Parse the response and extract the first result
    first_result = response.html.find("h3")[0]
   
# Get the link for the first result
    link = first_result.a["href"]

# Use pytube to download the video at the link
    yt = YouTube(link)
    video = yt.streams.first()
    video.download(folder_name)

Does this answer your question? [Using python requests and beautiful soup to pull text](https://stackoverflow.com/questions/39757805/using-python-requests-and-beautiful-soup-to-pull-text) — Daraan, Dec 13 '22 at 10:39
response objects have no html parser or something. See their [attributes](https://www.w3schools.com/PYTHON/ref_requests_response.asp). Best thing you get is the html as text which you can post process by BeautifulSoup for example. — Daraan, Dec 13 '22 at 10:43
So you just assumed that there would be an `html` attribute? Did you read `requests` docs? — DeepSpace, Dec 13 '22 at 10:44

Nick van Unen · Answer 1 · 2022-12-13T12:02:22.023

The line first_result = response.html.find("h3")[0] is the problem. The response variable contains an object called Response which has certain attributes but does not contain a html attribute. So when you call response.html you will get an AttributeError.

Standard, the Response object has the following attributes (which can be found by using dir(response):

['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']

Now the thing you might be looking for is response.content which will give you the html page in string.

However, to get all h3 elements you will not succeed with only requests. You will need another tool, something like: bs4 also know as BeautifulSoup which is a parser that is able to extract all html elements and it's content.

I think the solution that you are looking for is to get all H3 elements, this solution can also be found in this article: Scraping text in h3 and div tags using beautifulSoup, Python

So for your issue: first, at the top add of your script: from bs4 import BeautifulSoup

Then, if you replace:

# Parse the response and extract the first result
    first_result = response.html.find("h3")[0]

With:

# Parse the response and extract the first result
    content = response.content
    soup = BeautifulSoup(content, "html.parser")
    first_result = soup.find_all("h3")[0]

You should have the first h3 element and can get the "href" tag from it. Then rest of your script should also work correctly now.

AttributeError: 'Response' object has no attribute 'html'

1 Answers1