0

I am trying to scrape data from this link https://www.flatstats.co.uk/racing-system-builder.php using scrapy.

I want to automate the ajax call using scrapy. When I click "Full SP" Button (inspect in Firebug) the post parameter has the sql string which is "strange" race|2|eq|Ordinary|0|~tRIDER_TYPE What dialect is this?

My code :

import scrapy
import urllib

class FlatStat(scrapy.Spider):

    name= "flatstat"
    allowed_domains = ["flatstats.co.uk"]
    start_urls = ["https://www.flatstats.co.uk/racing-system-builder.php"]

    def parse(self, response):

        query_lst = response.xpath('//table[@id="system"]//tr/td[last()]/text()').extract()
        query_str = ' '.join(query_lst)

        url = 'https://www.flatstats.co.uk/ajax/sb_report.php'

        body_dict = {'a_e_max': '9.99',
                     'a_e_min': '0',
                     'arch_min': '0',
                     'exp_min': '0',
                     'report_type':'S',
                     # copied from the Post parameters by inspecting. Actually I tried everything.
                     'sqlFullString' : u'''Type%20(Rider)%7C%3D%7COrdinary%20(Exclude%20Amatr%2C%20App%2C%20Lady%20Races
                                         )%7CAND%7Crace%7C2%7C0%7COrdinary%7C0%7C~tRIDER_TYPE%7C-t%7Ceq''',
                     #I tried copying this from the post parameters as well but no success.
                     #I also tried sql from the table //td text() which is "normal" sql but no success
                     'sqlString': query_str}

        #here i tried everything FormRequest as well though there is no form.
        return scrapy.Request(url, method="POST", body=urllib.urlencode(body_dict), callback=self.parse_page)


    def parse_page(self, response):

        with open("response.html", "w") as f:
            f.write(response.body)

So questions are:

  1. What is this sql.
    1. Why isn't it returning me the required page. How can I run the right query?
    2. I tried Selenium as well to click the button and let it do the stuff it self but that is another unsuccessful story. :(
user_3068807
  • 397
  • 3
  • 13

1 Answers1

2

It's not easy to say what the website creator is doing with the submitted sqlString. It probably means something very specific to how the data is processed by their backend.

This is an extract of the page JavaScript in-HTML code:

...
    function system_report(type) {

        sqlString = '', sqlFullString = '', rowcount = 0;

        $('#system tr').each(function() {
            if(rowcount > 0) {
                var editdata = this.cells[6].innerHTML.split("|");
                sqlString += editdata[0] + '|' + editdata[1] + '|' + editdata[7] + '|' + editdata[3] + '|' + editdata[4] + '|' + editdata[5] + '^';
                sqlFullString += this.cells[0].innerHTML + '|' + encodeURIComponent(this.cells[1].innerHTML) + '|' + this.cells[2].innerHTML + '|' + this.cells[3].innerHTML + '|' + this.cells[6].innerHTML + '^';
            }
            rowcount++;         
        });
        sqlString = sqlString.slice(0, -1)
...

Looks non trivial to reverse-engineer.

Although it's not a solution to your "sql" question above, I suggest that you try using splash (an alternative to selenium in some cases).

You can launch it with docker (the easiest way):

$ sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

With the following script:

function main(splash)
  local url = splash.args.url
  assert(splash:go(url))
  assert(splash:wait(0.5))

  -- this clicks the "Full SP" button
  assert(splash:runjs("$('#b-full-report').click()"))
  -- loading the report takes some time
  assert(splash:wait(5))
  return {
    html = splash:html()
  }
end

you can get the page HTML with the popup of the report.

You can integrate Splash with Scrapy using scrapyjs (a.k.a scrapy-splash)

See https://stackoverflow.com/a/35851072/ with an example how to do so with a custom script.

Community
  • 1
  • 1
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
  • Thanks a lot. Thats what I thought. When I cannot figure it out them probably scrapyjs/selenium integration would work. Now i am working over it. Actually I tried getting the sql query (the normal sql) from the table and tried it as well. I was just wondering it is so flavor of sql-javascript that I dont know of. Thanks for clearing my doubt. – user_3068807 Mar 08 '16 at 12:53