I am working on scrapy and scraped a site and fetched all the information
Actually I had 3 spiders with different data, I had created these 3 spiders in the same folder with the following structure
scrapy.cfg
myproject/
__init__.py
items.py
pipelines.py
settings.py
spiders/
__init__.py
spider1.py
spider2.py
spider3.py
Now when we run that particular spider I need to create a csv file through pipeline with that spider name , for example
spider1.csv,spider2.csv,spider3.csv and so on
(Spiders are not limited they may be more)> According to the number of spiders and spider names I want to create csv files
Here whether we can create more than one pipeline in pipeline.py ? also how to create the csv file with spider name dynamically if more than one spider exists
Here I had 3 spider and I want to run all the 3 spiders at once(by using scrapyd), when I run all the 3 spiders 3 csv files with their spider names should be created. And I want to schedule this spiders running for every 6 hours. If something is wrong in my explanation please correct me and let me know how to achieve this.
Thanks in advance
Edited Code: For example I am pasting my code for only spider1.py
code in spider1.py:
class firstspider(BaseSpider):
name = "spider1"
domain_name = "www.example.com"
start_urls = [
"www.example.com/headers/page-value"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
........
.......
item = Spider1Item()
item['field1'] = some_result
item['field2'] = some_result
.....
.....
return item
Pipeline.py code:
import csv
from csv import DictWriter
class firstspider_pipeline(object):
def __init__(self):
self.brandCategoryCsv = csv.writer(open('../%s.csv' % (spider.name), 'wb'),
delimiter=',', quoting=csv.QUOTE_MINIMAL)
self.brandCategoryCsv.writerow(['field1', 'field2','field3','field4'])
def process_item(self, item, spider):
self.brandCategoryCsv.writerow([item['field1'],
item['field2'],
item['field3'],
item['field4'])
return item
As I stated before when I run the above spider with spider name, a csv file with the spider name will be created dynamically.....
but now when if I run the remaining spiders like spider2,spider3,spider3
, the csv files with their corresponding spider names should generate.
whether the above code is enough for the above functionality?
whether we need to create another pipeline class to create another csv file?(Is it possible to create more than one pipeline classes in a single pipeline.py file?)
If we create multiple pipeline classes in a single pipeline.py file, how to match the particular spider to its related pipeline class
I want to achieve the same functionality when saving to database, I mean when I run the spider1 all data of spider1 should saved to database into a table with relative spider name. Here for each spider I had different sql queries(so need to write different pipeline classes)
- Here intension is when we run multiple spiders all at a time(using scrapyd) , multiple csv files should generate with their spider names and multiple tables should be created with spider names(When saving in to database)
Sorry if am wrong anywhere, I hope its well explained and if not please let me know.