0

Made this scraper that scrapes data correctly but the problem is with exporting it to csv. The default - o filname.csvdoesn't paste data in the correct order. Need some guidance to do it.The item['name'] should in first column and item['link'] in second. This is the code.

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import re
from ..items import WebscItem


class YuSpider(CrawlSpider):
    name = 'yu'
    allowed_domains = ['farfeshplus.com',
                       'wintv.live']
    start_urls = ['https://www.farfeshplus.com/Video.asp?ZoneID=297']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//td[@class="text6"]'), callback='parse_item', follow=True),

    )

    def parse_item(self, response):
        items = WebscItem()
        for url in response.xpath('//html'):
            items['name'] = url.xpath('//h1/div/text()').extract()

            yield items

            frames = url.xpath('//iframe[@width="750"]/@src').extract_first()

            yield scrapy.Request(url=frames, callback=self.parse_frame)

    def parse_frame(self, response):
        items = WebscItem()
        URL = response.xpath('//body/script').extract_first()

        
        mp4 = re.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
        link = mp4.findall(URL)[0]
       
        items['link'] = link
        yield items
Ibtsam Ch
  • 383
  • 1
  • 8
  • 22
  • This is when the usage of Meta Parameter comes into play. Check out [this answer](https://stackoverflow.com/a/13911764/9189799) to learn how to. – SIM Oct 23 '19 at 20:42

2 Answers2

2

You need to use FEED_EXPORT_FIELDS in your settings.py

gangabass
  • 10,607
  • 2
  • 23
  • 35
0

If you want to export data to a csv you could maybe use Pandas.

First you should make a Pandas-Dataframe from your and then you can export this dataframe to a csv:

from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
                   'mask': ['red', 'purple'],
                   'weapon': ['sai', 'bo staff']})
df.to_csv()

I'm not sure if this is what you are looking for

thefakejeff
  • 57
  • 1
  • 7