First of all i have been working with Python since a month and i created an application. This app needs score results. It collects home, away and today matches datas. I use Scrapy to collect these datas without a problem!
Scrapy creates 3 json data : home.json, away.json and today.json
import scrapy
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
class Home(scrapy.Spider):
runnerHome = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\Home.json": {"format": "json", "overwrite": True}
},
})
class Away(scrapy.Spider):
runnerAway = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\Away.json": {"format": "json", "overwrite": True}
},
})
class Today(scrapy.Spider):
runnerToday = CrawlerRunner(settings = {
"FEEDS": {
r"file:///C:\\Users\Messi\\Today.json": {"format": "json", "overwrite": True}
},
})
@defer.inlineCallbacks
def crawl():
yield runnerHome.crawl(Home)
yield runnerHome.crawl(Away)
yield runnerHome.crawl(Today)
reactor.stop()
crawl()
reactor.run()
In this structure : running the spiders sequentially by chaining the deferreds Above Code block works excellent without problem!
My second code creates a single usable data file (data.json) by using home.json, away.json and today.json
def data():
# Ham scrapy dosyasi okundu!
dosya = open ("C:\\Users\Messi\\Home.json")
homeVeriler = json.load(dosya)
dosya.close()
dosya = open ("C:\\Users\Messi\\Away.json")
awayVeriler = json.load(dosya)
dosya.close()
dosya = open ("C:\\Users\Messi\\Today.json")
todayVeriler = json.load(dosya)
dosya.close()
# Some calculations and creates data.json
with open ("C:\\Users\Messi\\data.json", "w") as dosya:
json.dump(veriler, dosya)
Oke What i want?
while scrapy scrapes the pages, it sometimes skips some lines so when i run data() manually which is in the other py file, in this time i got error. I need a loop system with a schedule. I want, if there is a problem on scraping, so the data() that will create an error so please try again scraping and retry data() file please. And do this 12:05 am every day :D
First home.json file is created, then Away.json, lastly Today.json I changed a line at Home.json so the Scrapy did not scan Home.json again. There is a problem at While Loop. yenile() did not start Scrapy again.
@defer.inlineCallbacks
def crawl():
yield runnerHome.crawl(LivescoresHome)
yield runnerAway.crawl(LivescoresAway)
yield runnerToday.crawl(LivescoresToday)
reactor.stop()
def yenile():
crawl()
reactor.run()
while True:
try:
yenile()
data()
break
except:
pass
What should be the True loop structure?
Thanks very much. I love Stackoverflow