I am trying to input the file to scrapy for processing. But I don't know why I am getting problem giving input in the file format. Here is what I tried:
with open("url.txt","r") as f:
DOMAIN = [u.strip() for u in f.readlines()]
print DOMAIN
URL = 'http://%s' % DOMAIN
class MySpider(scrapy.Spider):
name = "emailextractor"
allowed_domains = [DOMAIN]
start_urls = [
URL
]
The input file is in this format:
emaxple.com
example.net
example.org.... etc
How to give input to scrapy in the format that I am using. I am trying to append the http://
to all the URL I will feed. Even the file is extremely large in GB. So What is the best thing I should do? Kindly, help me.
This question didn't work for me: Pass input file to scrapy containing a list of domains to be scraped