Scraping seat layout page of book my show using python

Question

I am trying to scrape the bookmyshow website for finding out movie details like at what time tickets are available and how many seats are available. I have got to find how to get the show timings in which seats are available but now i want to get total seats avaialble in that show. My code is :

    import requests
from bs4 import BeautifulSoup
import json
base_url = "https://in.bookmyshow.com"
s =requests.session()
headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
r = s.get("https://in.bookmyshow.com/vizag/movies", headers = headers)
print(r.status_code)
soup = BeautifulSoup(r.text,"html.parser")
movies_list = soup.find("div",{"class":"__col-now-showing"})
movies = movies_list.findAll("a",{"class":"__movie-name"})
for movie in movies:
    print(movie.text)
show = []
containers = movies_list.findAll("div",{"class":"card-container"})
for container in containers:
    try:
        detail = container.find("div",{"class":"__name overflowEllipses"})
        button = container.find("div",{"class":"book-button"})
        print(detail.text)
        print(button.a["href"])
        url_ticket = base_url + button.a["href"]
        show.append(url_ticket)
    except:
        pass
for i in show:
    print(i)
for t in show:
    res = s.get(t,headers=headers)
    bs = BeautifulSoup(res.text,"html.parser")
    movie_name = bs.find("div",{"class":"cinema-name-wrapper"})
    print(movie_name.text.replace(" ","").replace("\t","").replace("\n",""))
    venue_list = bs.find("ul",{"id":"venuelist"})
    venue_names = venue_list.findAll("li",{"class":"list"})
    try:
        for i in venue_names:
            vn = i.find("div",{"class":"__name"})
            print(vn.text.replace(" ","").replace("\t","").replace("\n",""))
            show_times = i.findAll("div",{"data-online":"Y"})
            for st in show_times:
                print(st.text.replace(" ","").replace("\t","").replace("\n",""))
    except:
        pass

    print("\n")
heads = {
    "accept":"*/*",
"accept-encoding":"gzip, deflate, br",
"accept-language":"en-US,en;q=0.9",
"origin":"https://in.bookmyshow.com",
"referer":"https://in.bookmyshow.com/buytickets/chalo-vizag/movie-viza-ET00064364-MT/20180204",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
rr = s.post("https://b-eu.simility.com/b?c=bookmyshow&v=1.905&ec=BLOFaZ2HdToCxwcr&cl=0&si=5a76bfce6ae4a00027767ae9&sc=3B0CB9F4-4A27-4588-9FB4-A2A2760569BC&uc=D834EDA4-57E4-4889-A34F-473AC6BBDDBB&e=Seatlayout&cd=.simility.com&r=0&st=1517731803171&s=792a6c66313a2032223133302633343a2c393a322e3c202422636e312a382037633f3c606669673e61653e6338323230353f3c35616f3b2a2c2269663a203820606765696d7371606f77282e2a61663320327e70756f2e2a63643e20326c776e6e242861643f20326e75666e24206166342a306c75666e2422636e352a386c776e64262073692032223348324b403b4436253e43323d2f3c3538322f314440362f493843323d3438353633404b202e20776b2838224e3a3b34454e433c2f3735473c273638323b2541333e4425363531434b3c40424e464a422226206a66303120326c636c79672422626e303a203864636479672c28716c32342838253131322e2a7966323f203231353b353f31333a323b3b353326207b643428382a32202e207b6e302230767a756526207b663420382a6f6c2d5f512a2c2279663f203859206d642f5559202422656420552e2071663028383026207b6431392032204f6d7861666e6125372630202255616c666d757b2a4c542a33382e3031225f6b6c3436332a7a363e2b2841707a6e6d55676049617e2d3539352633362a2a434a564f4e242a6e6961672847656969672b22416a7a656f6525343b2e3024313a313b2c333b3822536b6469726925373b352c31342a2620736e3338223a2855616c313020242871643b362a3a224d6d67656e67224164612e282e2a73643b342a383a3036242871643b352a3a313f313e2e2071663932203a32343c2c227966393b2038333d39342c28716c323028383a362e20716c38332230303c2c22686639362038767a7f672c28606c313628383b2e206066393d203a282f3a30303f363c353a3a332a2620626e3330223a282024207565332a3076727f672422776d302a385920756d68656c282e2a65787a677a6b6f676c7c6b6e2d7d676a676c285f24207565342a3020576f60436974282e2a756535203228556568496174205d676a454e202e2a7d65323d203274727f6724207565312a30202d3b333c3833323a31333a202e2a7a66312838535b226b72786e6b61637c636d6e257a25676f656564672f616a7a656f6527726c66222620616c766770666b6e2d7a666e2d7663677f6770202e2a496a72656f6d20504e4428526e77656164202c6477646c5d26592a6372726e61696374636d662f706e642a2e206f6a626c606d6e656b666a68607863676d68676c6d6865676e67696f6a62636b202e2a496a72656f6d20504e4428546b67756d78202c6477646c5d26592a6372726e61696374636d662f78276c69616e2e63787a6e6969637c696f642d702f726c636b66202c286b667465786c696e2f6c636b662f7066776f696e282e2a4c63766b7e6f2243666b6d6e74282e66776e6e5f245120617a726469636b76616d6c2d7a257a72617a6b2577696e677e6b6c672f6b6e6f2226207f69646f74616c676166656b66617a766d722e6e6e64202e2055616e6776636c6d2043656c7c676c76224c6f617273727c696f6422456d66776e6d282e223b2c3c2e38243338303b205f5577",headers =heads) # i got the link while i was inspecting the booking tickets page
f = s.get("https://in.bookmyshow.com/buytickets/chalo-vizag/movie-viza-ET00064364-MT/20180204#!seatlayout") # this is the page gets displayed when we click the show time
ff = f.text
j = json.loads(ff)
print(j)

After i get the source code of this page i can get seats availability easily. But i am unable to get that page. How to do this? Thanks in Advance!

Without a clear description (step by step), it is hard to understand what is going on. However, as this line `print("\n")` appears once in your script, so let us consider this as a landmark. What are you trying to do after that line, I meant you are trying to send another http request to achieve what? — SIM, Feb 04 '18 at 11:41
after print("\n"), i opened one movie ticket booking page from where on clicking the showtime, it will redirect to seat layout page of that show. i just took only one movie for checking whether it is working or not. — Akhil Reddy, Feb 04 '18 at 12:23

score 0 · Answer 1 · answered Feb 04 '18 at 09:51

0

Steps: 1) use selenium to click on the time showing block

driver.find_element_by_xpath('<enter xpath>').click()

find xpath using inspect element and then click on element then copy you will get the option for copy xpath

time.sleep(4) # wait for 4 seconds for the page to appear

2) Get the html source code using

html = driver.page_source

then use beautiful soup to scrap the page

soup = BeautifulSoup(html,'html.parser')

Find all a href tag having class ='_available' and count them and then find all a href tag having class = '_blocked' and count them using these data you can find total no of seats and available seats

answered Feb 04 '18 at 09:51

Pygirl

12,969
5
30
43

without using selenium, i want to do with only requests library Is it not possible – Akhil Reddy Feb 04 '18 at 09:55
1

For clicking on element you can use mechanize if page is not javascript rendered after clicking on time div and choosing how many tickets then only you can access the seats info page . I don't think it's possible to do by only requests library because when you click some javascript gets triggered read this : https://stackoverflow.com/questions/37164675/clicking-button-with-requests – Pygirl Feb 04 '18 at 10:08
1

If you dont want selenium to open the browser ,you can use `pyvirtualdisplay` module or use PhantomJS which is a headless browser – Pygirl Feb 04 '18 at 10:11

Scraping seat layout page of book my show using python

1 Answers1