Scrapy: Get data on page and following link

Question

I have been using scrapy for a personal project. My problem is very similar to the question asked on the following page:

Scrapy: Follow link to get additional Item data?

The page I am scraping is the following: http://www.tennisinsight.com/player_activity.php?player_id=51

This page has a list of matches in this form for eg:

Round of 16 Def. Ivan Dodig(37,2.41) (CRO) 6-3 6-3 Recap Match Stats $1.043

I have currently written in scrapy code that opens every link on the page which has the "Match Stats" link, and scrapes data on that page into an individual record

In addition to this, I want to scrape the "Odds" column (which is the $1.043 above) and add this data to the record.

I have searched for an answer and it seems that I have to use the Request meta field and pass this data along to the parse method. However, I have a problem because I am struggling to incorporate it into my code. The answer from the stackoverflow link I linked above is "To scrape additional fields which are on other pages, in a parse method extract URL of the page with additional info, create and return from that parse method a Request object with that URL and pass already extracted data via its meta parameter."

This makes perfect sense, however, the URLs that I scrape are in the rules, so I dont know how to extract the required data.

Here is part of my code so far which will hopefully better explain my problem.

rules = (
Rule(SgmlLinkExtractor(allow=r"match_stats_popup.php\?matchID=\d+",
restrict_xpaths='//td[@class="matchStyle"]',
tags='a', attrs='href', process_value=getPopupLink), callback='parse_match', follow=True)

The parse_match function parses the match stats into one item.

So what happens is that each of these match stats links are opened up, and there is no way for me to access the main page's Odds column.

Any help will be much appreciated.

Don't know how to do this with `CrawlSpider`, so I suggest you to use the regular `Spider` — warvariuc, May 06 '14 at 09:11

Scrapy: Get data on page and following link

0 Answers0