Form to Form, can't get results page to load

Question

I did some searching like like this delta one, but was unable to get what I need. I am stuck and can't get the results page to work or load up.. or whatever it isn't doing. I am looking for some insight on this.

I was able to get the spider to crawl through the disclaimer page (I think, I am not even sure 100% on how to check if it is successful). But then on the search page, I can't figure out what to do. My attempt is below. This is also my first post on stackoverflow as I just joined, so sorry about if I messed up the code formatting.

from scrapy.spider import Spider
from scrapy.http import FormRequest

from time import sleep

class ccSpider(Spider):

    name = "courtsSpider"
    allowed_domains = ["courts.state.md.us"]
    start_urls = ["http://casesearch.courts.state.md.us"]


    def parse(self,response):
        self.log('\n\n[Parse is Starting...]')
        print response.url
        if "I have read" in response.body:
            print "Disclaimer Page Accessed\n\n"
        else:
            print "Disclaimer Page not Accessed\n\n"
            return

        sleep(1)
        return FormRequest.from_response(response,
            formname = 'main',
            formdata = {'disclaimer':'Y'},
            callback = self.parseSearchPage
        )

    def parseSearchPage(self,response):
        self.log('\n\n[Accessing Search Criteria Page...]')
        print response.url
        if "Default is person" in response.body:
            print "Search Page Accessed\n\n"
        else:
            print "Search Page not Accessed\n\n"
            return

        sleep(1)
        return FormRequest.from_response(response,
            formname = 'inquiryForm',
            formdata = {'lastName':'SMITH',
                        'firstName':'JOHN',
                        #'company':'N',
                        #'middleName':'',
                        #'exactMatch':'N',
                        #'site':'00',
                        #'courtSystem':'B',
                        #'filingStart':'',
                        #'filingEnd':'',
                        #'filingData':'',
                        #'caseId':''
                        },
            callback = self.parseResultsPages
        )

    def parseResultsPages(self,response):
        self.log('\n\n[Accessing Search Results Page...]')
        print response.url
        if "items found" in response.body:
            print "Results Page Accessed\n\n"
        else:
            print "Results Page not Accessed\n\n"
            print "Title of Page: " + response.xpath('//title/text()').extract()[0].strip()
            return

        # The Print below should be giving me search results titled page.. I think.
        print response.xpath('//title/text()').extract()[0].strip()

score 0 · Answer 1 · edited May 23 '17 at 11:58

0

You may need to maintain a session cookie. Scrapy use request that features a cookies. See this related answe:. Scrapy - how to manage cookies/sessions

edited May 23 '17 at 11:58

Community

1
1

answered Jun 05 '15 at 16:35

Josep Valls

5,483
2
33
67

Ok, I'll look into that as well.. The format of the tags really seems difficult to work through. – QuantumQQ Jun 07 '15 at 04:28
I figures out what to do. I had to use selenium to navigate through the system.. then use beautiful soup to extract the information I need. ScraPy was really fast and nice but it didn't work on that site too well. Ajax sites I figure to be too finicky. – QuantumQQ Jul 21 '15 at 15:13

Form to Form, can't get results page to load

1 Answers1