0

my output is as follows

0 winner  loser
1 winner1
2       loser1
3 winner2
4       loser2
5 winner3
6       loser3

how do I remove the empty cells so that winner and loser values are on the same row? I've tried to locate add new line parameters to pipelines but have no luck. Is there any way to over-ride pipelines to only write if item has a value to the row so the output can be on the same row?

spider.py


import scrapy
from scrapy_splash import SplashRequest
from scrapejs.items import SofascoreItemLoader
from scrapy import Spider
import scrapy
import json
from scrapy.http import Request, FormRequest


class MySpider(scrapy.Spider):
    name = "jsscraper"

start_urls = ["https://www.sofascore.com/tennis/2018-02-07"]

def start_requests(self):
    for url in self.start_urls:
        yield SplashRequest(url=url,
                            callback=self.parse,
                            endpoint='render.html',
                            args={'wait':3.5})

def parse(self, response):
        for row in response.css('.event-team'):
                il = SofascoreItemLoader(selector=row)
                il.add_css('winner' , '.event-team:nth-
                 child(2)::text')
                il.add_css('loser' , '.event-team:nth-
                child(1)::text')

                yield il.load_item()

  pipline.py
  from scrapy.exporters import CsvItemExporter


  class ScrapejsPipeline(object):
     def process_item(self, item, spider):
      return item

 class CsvPipeline(object):
    def __init__(self):
      self.file = open("quotedata2.csv", 'w+b')
    self.exporter = CsvItemExporter(self.file, str)
    self.exporter.start_exporting()

def close_spider(self, spider):
    self.exporter.finish_exporting()
    self.file.close()

def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item

items.py

import scrapy

from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose,
from operator import methodcaller
from scrapy import Spider, Request, Selector

class SofascoreItem(scrapy.Item):
    loser = scrapy.Field()
    winner = scrapy.Field()
    #date = scrapy.Field()



class SofascoreItemLoader(ItemLoader):
    default_item_class = SofascoreItem
    default_input_processor = MapCompose(methodcaller('strip'))
    default_output_processor = TakeFirst()
tomoc4
  • 337
  • 2
  • 10
  • 29

1 Answers1

0

Check this one, the problem is located: https://stackoverflow.com/a/48859488/9270398

just_be_happy
  • 592
  • 1
  • 6
  • 19
  • Add this code leave my csv empty. I get a key error on both winner and loser File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/item.py", line 59, in __getitem__ return self._values[key] KeyError: 'loser' – tomoc4 Feb 15 '18 at 18:48
  • could you please show me your codes in items.py? Thanks. – just_be_happy Feb 16 '18 at 00:05
  • I've edited the answer. This one should be ok. The error means the value of the 'loser' key doesn't exist, and by using get(), we can get None even if it happened. – just_be_happy Feb 16 '18 at 00:49
  • This is really weird. It's dropping all rows instead of blanks {'loser': 'Haddad Maia B / Stefani L'} 2018-02-16 01:00:53 [scrapy.core.scraper] WARNING: Dropped: This one is empty – tomoc4 Feb 16 '18 at 01:01
  • maybe this is a better error File "/Users/mac/PycharmProjects/scrapyfundamentals/scrapejs/scrapejs/scrapejs/pipelines.py", line 20, in process_item if not item('loser'): TypeError: 'SofascoreItem' object is not callable – tomoc4 Feb 16 '18 at 01:08
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165258/discussion-between-la-vie-est-belle-and-tomoc4). – just_be_happy Feb 16 '18 at 01:16
  • Same output as original. – tomoc4 Feb 16 '18 at 01:19
  • maybe something is wrong with self.exporter.export_item(item) – just_be_happy Feb 16 '18 at 01:26
  • if might help looking at my other question. Strange behaviour with json output too – tomoc4 Feb 19 '18 at 02:21