1

No prior Python experience, so this could be very basic.

I am trying to record the names, and later prices, of all of the hockey sticks sold by Canadian retailer SportChek.

My code so far looks like this:

 # Import libraries
import requests
from bs4 import BeautifulSoup

# Collect the page
page = requests.get('https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks')

# Create BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')

# Pull all text from product-title-text class
stick_name_list = soup.find_all(class_='product-title-text')

# Pull all text from product-price-text
stick_price_list = soup.find_all(class_='product-price-text')

I believe this code should collect the appropriate data, but I'm not sure how to now display the variables.

Using the variable name (i.e. "stick_name_list") returns "[]" and "print stick_name_list" asks for parentheses, but obviously "print 'stick_name_list'" isn't right.

Any guidance is appreciated.

QHarr
  • 83,427
  • 12
  • 54
  • 101
Stn
  • 437
  • 1
  • 6
  • 15

4 Answers4

1

It looks like that website,

https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks

loads the product data using JavaScript, so when requests.get gets the html there are no products to parse.

If you disable JavaScript in your browser you will see that there are no html tags with class product-title-text or product-price-text.

More information here:

Using python Requests with javascript pages

David Robles
  • 9,477
  • 8
  • 37
  • 47
1

I’d suggest looking to see if you can parse the JSON that might be on the webpage. More information here: https://stackoverflow.com/a/47373146/7838574

Daniel Butler
  • 3,239
  • 2
  • 24
  • 37
1

You can use the same url the page uses to update content. You can find this in the network tab. It returns json which you can filter on type == product to get the hockey sticks. You can change the count argument in the url querystring to bring back more results.

import requests
import pandas as pd

data = requests.get('https://www.sportchek.ca/services/sportchek/search-and-promote/products?x1=c.category-level-1&q1=Gear&x2=c.category-level-2&q2=Hockey&x3=c.category-level-3&q3=Hockey+Sticks&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&page=1&count=100').json()

titles, prices = zip(*[(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'])
df = pd.DataFrame([(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'], columns = ['title', 'price'])
print(df.head())

df.head()

enter image description here

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • If you don't mind, could you explain what's going on in the second block of code? I'm not intuitively picking up on the format. I understand it's selecting certain parts of the scraped data and creating a table, but don't understand the formula it's using to do so. – Stn Apr 26 '19 at 19:50
  • 1
    Sure thing. Will have to be a little later so please remind me if nothing posted by the end of tomorrow. – QHarr Apr 26 '19 at 21:10
1

As others stated, you can just directly get the json (as opposed to having to parse that)

import requests
import math
from pandas.io.json import json_normalize


url = 'https://www.sportchek.ca/services/sportchek/search-and-promote/products'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': '1',
'count': '200'}




jsonData = requests.get(url, headers=headers, params=payload).json()
total_products = jsonData['resultCount']['total']
total_pages = math.ceil(total_products / 200)

for page in range(2, total_pages+1):
    payload = {
            'x1': 'c.category-level-1',
            'q1': 'Gear',
            'x2': 'c.category-level-2',
            'q2': 'Hockey',
            'x3': 'c.category-level-3',
            'q3': 'Hockey Sticks',
            'preselectedCategoriesNumber': '3',
            'preselectedBrandsNumber': '0',
            'page': page,
            'count': '200'}

    products = requests.get(url, headers=headers, params=payload).json()['products']
    jsonData['products'] = jsonData['products'] + products
    print ('Processed page: %s' %page)

df = json_normalize(jsonData['products'])

And you can manipultae the table any way you'd like, or just work straight off of the json file. I just converted it to a table though.

Output:

print (df[['title', 'price']])
                                                 title   price
0    Bauer Supreme 1S Griptac Senior Hockey Stick -...  339.99
1       Warrior Covert QRL SE Grip Senior Hockey Stick  329.99
2    Bauer Vapor X600 Lite Griptac Senior Hockey Stick   69.99
3                                           Gift Cards     NaN
4    Bauer Supreme 1S Clear Senior Hockey Stick - G...  339.99
5      Bauer Vapor 1X Lite Griptac Senior Hockey Stick  339.99
6    Bauer NEXUS 1N Griptac Gen II Senior Hockey Stick  254.97
7                                           Flash Sale     NaN
8                           Sher-Wood Project 9 Sticks     NaN
9    Bauer Supreme 2S Team Griptac Senior Hockey Stick  159.99
10   Bauer Supreme S160 Griptac Junior Hockey Stick...   44.97
11              Bauer Nexus 2N Pro Senior Hockey Stick  319.99
12     Warrior Alpha QX Grip Intermediate Hockey Stick  184.88
13               TRUE XC5 ACF Grip Junior Hockey Stick   79.99
14     Warrior Covert QRE ST2 Grip Senior Hockey Stick   89.99
15                             Mother's Day Gift Guide     NaN
16   Bauer Supreme S190 Griptac Senior Hockey Stick...  156.97
17   Bauer Vapor X700 Lite Griptac Senior Hockey Stick  119.99
18    Bauer Supreme 2S Pro Griptac Senior Hockey Stick  319.99
19              Bauer Nexus 2N Pro Junior Hockey Stick  199.99
20   Bauer Vapor 1X Lite Griptac Intermediate Hocke...  319.99
21   Bauer Supreme 1S Griptac Intermediate Hockey S...  223.97
22        Bauer Nexus 2N Pro Intermediate Hockey Stick  299.99
23            TRUE XC9 ACF Grip Junior 30 Hockey Stick  119.99
24                  TRUE XC9 ACF Youth 20 Hockey Stick   99.99
25                  Bauer Nexus 2N Senior Hockey Stick  224.99
26        Bauer Supreme 1S Youth Hockey Stick - Gen II   69.97
27        TRUE XC9 ACF Grip Gen II Senior Hockey Stick  319.99
28    Bauer Supreme 2S Pro Griptac Junior Hockey Stick  199.99
29   Bauer NEXUS N7000 Griptac Gen II Intermediate ...   89.97
..                                                 ...     ...
408        Warrior Covert QRL Grip Senior Hockey Stick  159.97
409  Bauer Vapor X800 Griptac Gen II Senior Hockey ...  109.97
410  Graf G95 Revolt Grip Senior Hockey Stick - GP0...  109.88
411            CCM Ribcor 47K Grip Senior Hockey Stick   79.97
412         Sher-Wood BPM 060 Grip Senior Hockey Stick   51.97
413        CCM RBZ Revolution Grip Senior Hockey Stick  149.88
414  CCM Premier R1.5 Senior Goalie Stick - Crawfor...   89.97
415       Bauer Vapor 1X Senior Goalie Stick - P31 25"  289.99
416     Sher-Wood GS350 Senior Goalie Stick 24" - PP41   96.97
417     Sher-Wood GS350 Senior Goalie Stick - PP41 27"   96.97
418     Bauer Vapor X900 Senior Goalie Stick - P31 26"  199.99
419          Sher-Wood GS150 Senior Goalie Stick - 24"   74.97
420          Sher-Wood GS150 Senior Goalie Stick - 25"   74.97
421           CCM 1060 Senior Goalie Stick - Price 27"   89.88
422          Sher-Wood GS150 Senior Goalie Stick - 26"   74.97
423          Sher-Wood GS150 Senior Goalie Stick - 27"   74.97
424  CCM Premier R1.9 Senior Goalie Stick - Crawfor...  119.97
425   Sher-Wood BPM 090 Grip Intermediate Hockey Stick   81.97
426  Warrior Covert QRL5 Grip Intermediate Hockey S...   63.97
427  Warrior Covert DT1 LT Grip Intermediate Hockey...  111.88
428  Warrior Covert Super Dolomite Grip Intermediat...  189.88
429  Warrior Dynasty HD1 Intermediate Stick - Grip ...  123.88
430  Easton Stealth CX Grip Intermediate Hockey Sti...  159.88
431  Easton Synergy 20 Intermediate Stick - Grip - ...   34.88
432  Sherwood T120 Intermediate Grip Hockey Stick -...   99.97
433  GRAF G75 Intermediate 70 Flex Hockey Stick - GP22   99.88
434  Bauer Vapor X700 Griptac Gen II Intermediate H...   79.97
435  Easton Synergy HTX Intermediate Stick - Grip -...  115.88
436  Sherwood T120 Intermediate Grip Hockey Stick -...   99.97
437   Sher-Wood BPM 060 Grip Intermediate Hockey Stick   51.97

[438 rows x 2 columns]
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Thanks! Could you walk me through this a little bit? I understand now that it's javascript, not HTML, that loads the product data. I don't understand how you guys were able to find the file containing the javascript. The network tab was mentioned, but there's a lot of stuff on that tab. If it's not too much I'd love a brief walkthrough of your code so I can get a better understanding and apply this to other sites as well. Thank you for your time! – Stn Apr 24 '19 at 18:13