When saving scraped item and file, Scrapy inserts empty lines in output csv file

Question

I have Scrapy (version 1.0.3) spider in which I extract both some data from web page and I also download file, like this (simplified):

def extract_data(self, response):
    title = response.xpath('//html/head/title/text()').extract()[0].strip()
    my_item = MyItem()
    my_item['title'] = title    

    file_url = response.xpath('...get url of file...')
    file_urls = [file_url]  # here there can be more urls, so I'm storing like a list
    fi = FileItem()
    fi['file_urls'] = file_urls 
    yield my_item
    yield fi

in pipelines.py I just override FilePipeline to change the name of the file:

from scrapy.pipelines.files import FilesPipeline

class CustomFilesPipeline(FilesPipeline):
    def file_path(self, request, response=None, info=None):
        filename = format_filename(request.url)
        return filename

in items.py I have:

class MyItem(scrapy.Item):
    title = scrapy.Field()

class FileItem(scrapy.Item):
    file_urls = scrapy.Field()
    files = scrapy.Field()

in settings.py I have:

ITEM_PIPELINES = {
    'myscraping.pipelines.CustomFilesPipeline': 100
}

now in output csv file I get something like this:

title1
title2
,
,
title3
etc.

It looks like that empty lines (having just comma) represents downloaded file and I would like to know or get advice how to prevent such lines to be in output csv file. (files are saved into folder).
In Scrapy settings I found out about FEED_STORE_EMPTY (which is by default false, i.e. it should not export empty feeds) but this not relates to files I guess.
I have feeling that this have to do something with pipelines but I can't figure out how to do it.
any help would be appreciated

why don't you put file_urls in your item MyItem(), and only yield one kind of item? — vianney, Oct 14 '15 at 13:53
amazing!!! I never thought of that (somehow I overlooked in documentation) :) thanks a lot — zdenulo, Oct 14 '15 at 15:36

score 0 · Accepted Answer · answered Oct 15 '15 at 09:32

I paste the answer here:

def extract_data(self, response):
    title = response.xpath('//html/head/title/text()').extract()[0].strip()
    my_item = MyItem()
    my_item['title'] = title    
    file_url = response.xpath('...get url of file...')
    my_item['file_urls'] = [file_url]
    yield my_item

When saving scraped item and file, Scrapy inserts empty lines in output csv file

1 Answers1