2

When I am calling Spider through a Python script, it is giving me an ImportError:

ImportError: No module named app.models

My items.py is like this:

from scrapy.item import Item, Field
from scrapy.contrib.djangoitem import DjangoItem

from app.models import Person

class aqaqItem(DjangoItem):
    django_model=Person
    pass

My settings.py is like this:

#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
#     http://doc.scrapy.org/topics/settings.html
#


BOT_NAME = 'aqaq'
BOT_VERSION = '1.0'

SPIDER_MODULES = ['aqaq.spiders']
NEWSPIDER_MODULE = 'aqaq.spiders'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
ITEM_PIPELINES = [
                        'aqaq.pipelines.JsonWithEncodingPipeline']

import sys
import os
c=os.getcwd()
os.chdir("../../myweb")
d=os.getcwd()
os.chdir(c)
sys.path.insert(0, d)

# Setting up django's settings module name.
# This module is located at /home/rolando/projects/myweb/myweb/settings.py.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings'

My Python script to call the spider is like this:

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from final.aqaq.aqaq.spiders.spider import aqaqspider
from scrapy.utils.project import get_project_settings
def stop_reactor():
    reactor.stop()

spider = aqaqspider(domain='aqaq.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

My directory structure is like this:

.
|-- aqaq
|   |-- aqaq
|   |   |-- call.py
|   |   |-- __init__.py
|   |   |-- __init__.pyc
|   |   |-- items.py
|   |   |-- items.pyc
|   |   |-- pipelines.py
|   |   |-- pipelines.pyc
|   |   |-- settings.py
|   |   |-- settings.pyc
|   |   `-- spiders
|   |       |-- aqaq.json
|   |       |-- __init__.py
|   |       |-- __init__.pyc
|   |       |-- item.json
|   |       |-- spider.py
|   |       |-- spider.pyc
|   |       `-- url
|   |-- call.py
|   |-- call_spider.py
|   `-- scrapy.cfg
|-- mybot
|   |-- mybot
|   |   |-- __init__.py
|   |   |-- items.py
|   |   |-- pipelines.py
|   |   |-- settings.py
|   |   `-- spiders
|   |       |-- example.py
|   |       `-- __init__.py
|   `-- scrapy.cfg
`-- myweb
    |-- app
    |   |-- admin.py
    |   |-- admin.pyc
    |   |-- __init__.py
    |   |-- __init__.pyc
    |   |-- models.py
    |   |-- models.pyc
    |   |-- tests.py
    |   `-- views.py
    |-- manage.py
    `-- myweb
        |-- file
        |-- __init__.py
        |-- __init__.pyc
        |-- settings.py
        |-- settings.pyc
        |-- urls.py
        |-- urls.pyc
        |-- wsgi.py
        `-- wsgi.pyc

Please help me as I am new to Scrapy.

i am real confused i tried importing

import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings

in my script at the top new error came that

get_project_settings is invalid

also my scarapy version is 18

Thank you all i got the solution

user2823667
  • 193
  • 2
  • 18
  • Are both projects (django and scraper) in the PYTHONPATH? Check this thread http://stackoverflow.com/questions/19068308/access-django-models-with-scrapy-defining-path-to-django-project – fasouto Oct 03 '13 at 16:39
  • i am not sure about scraper but for adding django to the PYTHONPATH i have added this code in settings.py import sys import os c=os.getcwd() os.chdir("../../myweb") d=os.getcwd() os.chdir(c) sys.path.insert(0, d) – user2823667 Oct 03 '13 at 17:42
  • how to add PYTHONVARIABLE Of scraper and where to add it – user2823667 Oct 03 '13 at 17:43
  • + 1 for file structure. – Games Brainiac Oct 03 '13 at 18:53
  • i am real confused i tried importing import os os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings in my script at the top new error came that get_project_settings is invalid also my scarapy version is 18 – user2823667 Oct 03 '13 at 19:00

2 Answers2

1

Perhaps your problem is that you are importing the spider before the settings. The ImportError might come from the from app.models import Person in your items.py.

So, import your spider after you set up the settings:

crawler.configure()

from final.aqaq.aqaq.spiders.spider import aqaqspider
spider = aqaqspider(domain='aqaq.com')

crawler.crawl(spider)
R. Max
  • 6,624
  • 1
  • 27
  • 34
  • /usr/local/lib/python2.7/dist-packages/twisted/spread/jelly.py:92: DeprecationWarning: the sets module is deprecated import sets as _sets it is giving me this error and when i do scrapy version command it is showing me following /usr/local/lib/python2.7/dist-packages/scrapy/settings/deprecated.py:26: ScrapyDeprecationWarning: You are using the following settings which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternatives): BOT_VERSION: no longer used (user agent defaults to Scrapy now) warnings.warn(msg, ScrapyDeprecationWarning) Scrapy 0.18.2 – user2823667 Oct 04 '13 at 06:23
  • @user2823667 can you explain how you fix the problem? – Goran Nov 16 '13 at 15:39
0

I writed this post in the Medium a time ago, perhaps it can help you!

https://medium.com/@tiago.piovesan.tp/make-a-crawler-with-django-and-scrapy-a41adfdd24d9

This is the integrate config between the libs: crawler/settings.py

import os
import sys
sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), ".."))
os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings'
import django
django.setup()