I have created a spider which checks for a particular movie booking site whether the film is opened for booking. It checks of every 10 seconds. But the problem I'm facing is, even when the booking is opened in the website, my code doesn't get the updated website, instead using the old scraped data.
for example:
I scraped the site and film 'A' is not opened for booking at 8AM. Booking for film 'A' is opened at 12PM, but the spider shows it's not opened for booking. To be noted, i'm using a indefinite while loop so I started running the program from 8AM and never stopped.
Code:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
import threading
import time
import datetime
import winsound
class NewFilmSpiderSpider(scrapy.Spider):
name = 'new_film_spider'
allowed_domains = ['www.spicinemas.in']
start_urls = ['https://www.spicinemas.in/coimbatore/now-showing']
def parse(self, response):
t = threading.Thread(self.getDetails(response))
t.start()
def getDetails(self, response):
while True:
records = response.xpath('//section[@class="main-section"]/section[2]/section[@class="movie__listing now-showing"]/ul/li/div/dl/dt/a/text()').extract()
if 'NGK' in str(records):
try:
print("Booking Opened",datetime.datetime.now())
winsound.PlaySound('alert.wav', winsound.SND_FILENAME)
except Exception:
print ("Error: unable to play sound")
else:
print("Booking Not Opened",datetime.datetime.now())
time.sleep(10)
If you run the code now, it says booking opened. but I need to get the webpage scraped at every while loop. How can I do that?
Update #1:
I'm getting these trace when running using the solution given below
File "C:\Users\ranji\Documents\Spiders\SpiCinemasSpider\spicinemas_spider\spiders\new_film_spider.py", line 34, in <module>
main()
File "C:\Users\ranji\Documents\Spiders\SpiCinemasSpider\spicinemas_spider\spiders\new_film_spider.py", line 30, in main
process.start()
File "C:\Users\ranji\AppData\Local\Programs\Python\Python37-32\lib\site-packages\scrapy\crawler.py", line 293, in start
reactor.run(installSignalHandlers=False) # blocking call
File "C:\Users\ranji\AppData\Local\Programs\Python\Python37-32\lib\site-packages\twisted\internet\base.py", line 1271, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "C:\Users\ranji\AppData\Local\Programs\Python\Python37-32\lib\site-packages\twisted\internet\base.py", line 1251, in startRunning
ReactorBase.startRunning(self)
File "C:\Users\ranji\AppData\Local\Programs\Python\Python37-32\lib\site-packages\twisted\internet\base.py", line 754, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable