2

I am building some Web Crawlers in Python with the Scrapy library. The goal is to get some data from a couple of shops.

When crawling, there are two kinds of sites:

  1. Catalogue sites, which display products and link to..
  2. Product sites, the sites where the different products of the shop can be viewed

Now, as a user, when I open a product site from a catalogue site, I am shown a "breadcrumb-style" menu - this shows me where I am. For example, if the shop would be an electronics shop and the product would be an iPhone, the breadcrumbs could show:

"Electronics -> Phones -> iPhones -> iPhone 5S 64GB"

However, this only happens if I follow the direct path outlined above. My problem is that when I crawl these sites via Scrapy, the breadcrumb doesn't show up - even though with the crawler I am following the path as above (i.e. I start on the Electronics site and then keep going deeper until I reach the product site). I even tinkered with the referrer settings, but nothing helped.

Is there another way, a way I could make these breadcrumb-menus appear?

Would really appreciate some input. :)

munzwurf
  • 81
  • 1
  • 7

1 Answers1

1

Most likely the site implements the breadcrumbs as cookies you are ignoring. You need to pass the session cookies from one request to the subsequent ones. This question demonstrates cookie usage with scrapy.

Community
  • 1
  • 1
WeaselFox
  • 7,220
  • 8
  • 44
  • 75