I was developing a web scraper to obtain full curriculum from a UDEMY course. I used beautiful soup and request in python. Although, some in the page the last sections of the curriculum is collapsed and we have to click to expand. How to extract the entire curriculum?
URL: https://www.udemy.com/python-the-complete-python-developer-course/
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as Soup
my_url = "https://www.udemy.com/python-the-complete-python-developer-course/"
head = {'User-Agent':'Mozilla/5.0'}
pagereq = Request(my_url, headers=head)
pager = urlopen(pagereq)
page = pager.read()
pager.close()
Sp = Soup(page, "html.parser")
Sections = Sp.findAll("div", {"class": "content-container"})
numlec = Sp.find("div", {"class": "num-lectures"})
for section in Sections:
SecTitle = section.find("span", {"class": "lecture-title-text"}).text.strip()
SecLen = section.find("span", {"class": "section-header-length"}).text.strip()
lectures = section.findAll("div", {"class": "lecture-container"})
print("-" * 40)
print(SecTitle+"\t"+SecLen)
print()
for lecture in lectures:
name = lecture.find("div", {"class": "title"}).text.strip()
leng = lecture.find("span", {"class": "content-summary"}).text.strip()
print("\t {}\t{}".format(name, leng))
print("-" * 40)
This will scrape all data till the collapsed text. But I want the full curriculum. Is there any easy way to do this?