0

My spider has to read some data from input.csv file. It runs fine locally. But when I try to deploy it on Zyte by shub deploy it does not includes input.csv in build.

So when I try to run it on the server it produces following error.

Traceback (most recent call last):
  File "<frozen zipimport>", line 177, in get_data
KeyError: 'webscrap/resources/input.csv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/scrapy/core/engine.py", line 127, in _next_request
    request = next(slot.start_requests)
  File "/app/__main__.egg/webscrap/spiders/website_scraper.py", line 13, in start_requests
    zipcodes_csv = pkgutil.get_data("webscrap", "resources/input.csv")
  File "/usr/local/lib/python3.8/pkgutil.py", line 637, in get_data
    return loader.get_data(resource_name)
  File "<frozen zipimport>", line 179, in get_data
OSError: [Errno 0] : 'webscrap/resources/input.csv'

Here is my code

        zipcodes_csv = pkgutil.get_data("webscrap", "resources/input.csv")
        with io.TextIOWrapper(io.BytesIO(zipcodes_csv), encoding='utf-8') as file:
            csvreader = csv.DictReader(file)

Here is setup.py file

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = webscrap.settings']},
    package_data={
        'project': ['resources/*.csv']
    },
    include_package_data=True,
)

Here is the directory structure of my project

1 Answers1

0

Fixed it by changing setup.py file to

setup(
name         = 'webscrap',
version      = '2.0',
packages     = find_packages(),
entry_points = {'scrapy': ['settings = webscrap.settings']},
package_data={
    'webscrap': ['resources/*.csv']
},
include_package_data=True,

)

and solved some dependency issues in requirements.txt and added it in scrapinghub.yml file