0

I have written code for a django web page that has a form for user input. When the user enters text into the form and clicks the submit button, a celery task which runs a scrapy spider needs to be started. The form actually takes the name of a band which is to be passed as an argument to the spider and concatenated to the start url. So far, from the code I have, whenever the commands python manage.py celery worker --loglevel=info or python manage.py runserver, the log for the scrapy spider starts to execute, but never actually shows the webpages being crawled as it normally does. However, when I submit the form request, the scrapy spider is not being run. What is the proper way to run the celery task when the submit button is clicked. I was following the solution from this SO post, but Scrapy and celery have since been updated and the solution doesn't seem to be working now. The code for the relevant files is below:

tasks.py

from celery.registry import tasks
from celery.task import Task
from django.template.loader import render_to_string
from django.utils.html import strip_tags
from django.core.mail import EmailMultiAlternatives
from ticket_city_scraper.ticket_city_scraper.spiders.tc_spider import spiderCrawl
from celery import shared_task

@shared_task
def crawl():
    return spiderCrawl()

Edit:

As can be seen in the views file, the crawl method is only called in the choice view, but every time the a new page is visited, the spider log starts

views.py

from django.shortcuts import render
from  .forms import ContactForm, SignUpForm, BandForm
from tasks import crawl
def choice(request):
    title = 'Welcome'
    form = SignUpForm(request.POST or None)
    context = {
        "title" : title,
        "form" : form,
    }

    if form.is_valid():
        instance = form.save(commit = False)
        full_name = form.cleaned_data.get("full_name")

        if not full_name:
            full_name = "New full name"
        instance.full_name = full_name  
        # if not instance.full_name:
        #   instance.full_name = "A name"
        instance.save()
        context = {
            "title" : "Thank you",

    }
    crawl.delay()
    return render(request, "home.html", context)

terminal window when running server

    -------------- celery@elijah-VirtualBox v3.1.18 (Cipater)
---- **** ----- 
--- * ***  * -- Linux-3.13.0-54-generic-x86_64-with-Ubuntu-14.04-trusty
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         default:0x7faaebc80410 (djcelery.loaders.DjangoLoader)
- ** ---------- .> transport:   amqp://guest:**@localhost:5672//
- ** ---------- .> results:     database
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ---- 
--- ***** ----- [queues]
 -------------- .> celery           exchange=celery(direct) key=celery


[tasks]
  . comparison.tasks.crawl

[2015-08-21 23:15:21,076: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2015-08-21 23:15:21,186: INFO/MainProcess] mingle: searching for neighbors
[2015-08-21 23:15:22,244: INFO/MainProcess] mingle: all alone
/home/elijah/Desktop/trydjango18/trydjango18/local/lib/python2.7/site-packages/djcelery/loaders.py:136: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
  warn('Using settings.DEBUG leads to a memory leak, never '

[2015-08-21 23:15:22,331: WARNING/MainProcess] /home/elijah/Desktop/trydjango18/trydjango18/local/lib/python2.7/site-packages/djcelery/loaders.py:136: UserWarning: Using settings.DEBUG leads to a memory leak, never use this setting in production environments!
  warn('Using settings.DEBUG leads to a memory leak, never '

[2015-08-21 23:15:22,333: WARNING/MainProcess] celery@elijah-VirtualBox ready.
[2015-08-21 23:15:24,294: INFO/MainProcess] Received task: comparison.tasks.crawl[d930a0e8-7d63-4d55-ba85-53bb174f98f4]
[2015-08-21 23:15:24,296: INFO/MainProcess] Received task: comparison.tasks.crawl[37187368-cfd1-4b9e-9a2e-8e14266947ef]
[2015-08-21 23:15:24,298: INFO/MainProcess] Received task: comparison.tasks.crawl[d5aa8448-2ee5-47f9-8b6e-5112201665ef]
[2015-08-21 23:15:24,300: INFO/MainProcess] Received task: comparison.tasks.crawl[d8ae8663-3fe1-484b-b43b-d54f173fd85e]
[2015-08-21 23:15:24,301: INFO/MainProcess] Received task: comparison.tasks.crawl[1eb42061-ec5a-4697-9df8-9b07c62f04f9]
[2015-08-21 23:15:24,302: INFO/MainProcess] Received task: comparison.tasks.crawl[d3a7619f-2fcc-4105-93f8-b2ac9004593b]
[2015-08-21 23:15:24,303: INFO/MainProcess] Received task: comparison.tasks.crawl[2b06afd0-24ab-4198-a49e-b32dfe0ca804]
[2015-08-21 23:15:24,505: ERROR/MainProcess] Task comparison.tasks.crawl[37187368-cfd1-4b9e-9a2e-8e14266947ef] raised unexpected: NameError("global name 'MySpider' is not defined",)
Community
  • 1
  • 1
loremIpsum1771
  • 2,497
  • 5
  • 40
  • 87
  • that's way too much code. please submit a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) - just a celery task, some snippets from the ipython command line to show it being invoked, and a few lines from the logs showing it being received (or not) by celery etc – scytale Aug 19 '15 at 13:36
  • @scytale Thanks. I can see that the task is being added after I use the command: *python manage.py celery worker --loglevel=info* ; however, I need a way to pass the argument of the user's input of the bandname to the spider or else it won't do anything. I'm currently calling the crawl method from a view for the form, but when the submit button is being clicked, there is no request sent in the terminal for the celery task of the spider crawl. Any ideas? – loremIpsum1771 Aug 20 '15 at 03:53
  • no. please see my previous comment. you have posted too much - please post a question that asks one question and supplies a small amount of code or log output that clearly illustrates the problem – scytale Aug 20 '15 at 10:12
  • @scytale ok, please disregard my previous comment for now. All I need to know for this question is how to properly have the task execute only when the form request is sent. Currently, it seems that the site is executing the task every time I go to a new page. After I solve this issue, I can post a new question, but not before. – loremIpsum1771 Aug 21 '15 at 05:54
  • you haven't posted your view code so I have no way of knowing – scytale Aug 21 '15 at 08:37
  • @scytale You said that what I had before was too much so I took off the files. I'll add the file for the view. – loremIpsum1771 Aug 21 '15 at 19:03
  • .. and now you've posted _all_ the view functions and too much log output - there is a middle way where you post only what is directly relevant to your question. it's very hard for us to read through all that and make sense of it. you only need to post the view function that runs the celery task. please try to learn how to post good questions – scytale Aug 21 '15 at 20:24
  • and the log you have provided is from django - that's not very useful - where is the celery worker output? – scytale Aug 21 '15 at 20:26
  • @scytale I just edited it again. There was at the end of the celery worker output that I didn't post because it was most likely caused by there not being a start_url (since I haven't yet been able to pass an argument to the spider). Can you just answer if know how to execute a task on the queue when a form request is made. If its not possible, I don't want to waste time on it. – loremIpsum1771 Aug 21 '15 at 23:25
  • you are running a task. that's what `crawl.delay()` does. the celery logs show the task being received. you should probably read the celery docs a few more times – scytale Aug 22 '15 at 11:32

0 Answers0