I want scrape relative website which link shared below.I need some parameters and I found the best solution like this for me.But I need scape different 2 part and I have no idea how to combine it well (combine as column) That is why I need your help.Also I am open for better solution. I need also skip some row cause of wrong scrape.Also I Dont wanna add some null rows. I will share output as a file . http://s7.dosya.tc/server14/tnx4u0/test.json.zip.html
In fact it must be table loop inside of base loop. But for show it better I did it like that for now. Thanks a lot
class KingsatSpider(Spider):
name = 'kingsat'
allowed_domains = ['https://tr.kingofsat.net/tvsat-turksat4a.php']
start_urls = ['https://tr.kingofsat.net/tvsat-turksat4a.php']
def parse(self, response):
tables=response.xpath('//*[@class="fl"]/tr')
bases=response.xpath('//table[@class="frq"]/tr')
for base in bases:
yield {
'Frekans':base.xpath('.//td[3]/text()').extract_first(),
'Polarizasyon':base.xpath('.//td[4]/text()').extract_first(),
'Kapsam':base.xpath('.//td[6]/a/text()').extract_first(),
'SR':base.xpath('.//td[9]/a[1]/text()').extract_first(),
'FEC':base.xpath('.//td[9]/a[2]/text()').extract_first(),
}
for table in tables:
yield {
'channel' :table.xpath('.//td[3]/a/text()').extract_first(),
'V-PID' : table.xpath('.//td[9]/text()[1]').extract_first(),
'A-PID' : table.xpath('.//td[10]/text()[1]').extract_first(),
}