The new graph search on facebook lets you search for current employees of a company using query token - Current Google employees (for example).
I want to scrape the results page (http://www.facebook.com/search/104958162837/employees/present) via scrapy.
Initial problem was facebook allows only a facebook user to access the information, so directing me to login.php. So, before scraping this url, I logged in via scrapy and then this result page. But even though the http response is 200 for this page, it does not scraps any data. The code is as follows:
import sys
from scrapy.spider import BaseSpider
from scrapy.http import FormRequest
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
from scrapy.http import Request
class DmozSpider(BaseSpider):
name = "test"
start_urls = ['https://www.facebook.com/login.php'];
task_urls = [query]
def parse(self, response):
return [FormRequest.from_response(response, formname='login_form',formdata={'email':'myemailid','pass':'myfbpassword'}, callback=self.after_login)]
def after_login(self,response):
if "authentication failed" in response.body:
self.log("Login failed",level=log.ERROR)
return
return Request(query, callback=self.page_parse)
def page_parse(self,response):
hxs = HtmlXPathSelector(response)
print hxs
items = hxs.select('//div[@class="_4_yl"]')
count = 0
print items
What could I have missed or done incorrectly?