Comments are visible on the webpage, but the html object returned by BeautifulSoup did not contain the comment parts

Question

I tried to extract the text content of comments from a web page using its URL link, and I used BeautifulSoup for scraping. The content of comments is visible on the page when I clicked the URL link, but the HTML object returned by BeautifulSoup did not contain these tags and texts.

I used BeautifulSoup with 'html.parser' to do the web scraping. I successfully extracted the number of likes/views/comments of the video in the given webpage, but the information of comment sections was not included in the HTML file. The browser I used was Chrome, and the system is Ubuntu 18.04.1 LTS.

This is the codes I used (in python):

from urllib.request import urlopen
from bs4 import BeautifulSoup
import os

webpage_link = "https://www.airvuz.com/video/Majestic-Beast-Nanuk?id=59b2a56141ab4823e61ea901"

try:
    page = urlopen(webpage_link)
except urllib.error.HTTPError as err:  # webpage cannot be found
    print("ERROR! %s" %(webpage_link))

soup = BeautifulSoup(page, 'html.parser')

The expected result is the soup object contains all the content which is visible on the webpage especially the text content of comments (like "Not being there I enjoyed a lot seeing the life style of white bear. Thanks to the provider for such documentary." and "WOOOW... amazing..."); however, I could not find the corresponding nodes in the soup object. Any help would be appreciated!

I have tried the solution post here: https://stackoverflow.com/questions/1936466/beautifulsoup-grab-visible-webpage-text?rq=1, but it didn't work either.. — Eve, Mar 25 '19 at 01:15

score 0 · Accepted Answer · answered Mar 25 '19 at 01:26

The comments are generated by JavasSript via an ajax request. You can send the same request and get the comments from the json response. You can find the request using the network tab in the inspect tool.

from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment
import json
webpage_link = "https://www.airvuz.com/api/comments/video/59b2a56141ab4823e61ea901?page=1&limit=20"
page = urlopen(webpage_link).read()
comments_json=data = json.loads(page)
for comment_info in comments_json['data']:
    print(comment_info['comment'].strip())

Output

Not being there I enjoyed a lot seeing the life style of white bear. Thanks to the provider for  such documentary.
WOOOW... amazing...
I've been photographing polar bears for years, but to see this footage from a drones perspective was epic! Well done and congratz on the Nominee! Well deserved.
You are da man Florian!
Absolutely outstanding!
This is incredible
jaw dropping
This is wow amazing, love it.
So cool! Did the bears react to the drone at all?
Congratulations! It's awesome! I am watching in tears....
Awesome!
perfect video awesome
It is very, very beautiful !!! Sincere congratulations
Made my day, exquisite, thank you
Wow
Super!
Marvelous!
Man this is incredible!
Material is good, but  edi is bad. This history about  beer's family...
Muy bueno!

Than you! How did you get the value of webpage_link? – Eve Mar 25 '19 at 02:05 — Eve, Mar 25 '19 at 02:05

Comments are visible on the webpage, but the html object returned by BeautifulSoup did not contain the comment parts

1 Answers1