Python WebScraping With Selenium&gChrome

Question

I'm trying to webscrape a webpage, but finding elements by their class name isn't working. I can see the element's class name in the Elements panel of Chrome and when entering that in, shown below, it returns an empty result.

from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
usernames = driver.find_elements_by_class_name("md-cell leaderboard-row")
usernames

I'm trying to use this leaderboard page to scrape at least the username and their points, further plan is to also note their position and have it entered into an excel spreadsheet but that's in the future and not what I'm having trouble with at the moment.

The output I see from running 'usernames' is '[]', which I know means that it's empty but I can't understand why if I can see the element and it's class name and it's exactly the same. Must be missing something or there's something I don't know.

Does this answer your question? [Selenium Compound class names not permitted](https://stackoverflow.com/questions/37771604/selenium-compound-class-names-not-permitted) — phil, May 05 '20 at 02:32

bherbruck · Accepted Answer · 2020-05-05T17:43:19.047

EDIT: go to the bottom to see a WAY better way of getting the data, doesn't have to be scraped from html in this case

Got it working! Just had to wait 10 seconds and only search for one class name:

import time
from selenium import webdriver


chrome_path = r"C:\webdrivers\chromedriver.exe" # or wherever you have your chrome webdriver installed
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")

# let the page load
time.sleep(10)

# list comprehension to return text of each element with class leaderboard-row
usernames = [element.text for element in
             driver.find_elements_by_class_name("leaderboard-row")
             if element.text != '']

print(usernames)

Output:

['underholderen', '42051', 'jimbyj', '39220', 'delynne', '35411', 'rawrnerunya', '30350', 'simmer5k', '25470', 'bloomspeed', '23885', 'jaidav2000', '22386', 'moobot', '18910', 'virgoproz', '18120', 'ottermandela', '18108', 'v_and_k', '17945', 'kalibxi', '17610', 'commanderroot', '17585', 'jujusan', '17575', 'mellowj', '15390', 'itsvodoo', '15080', 'lord_hal', '14945', 'darkk0ala', '14757', 'sirenmatty', '13230', 'myles_27', '12725', 'upsetpoptart', '12204', 'salsichasensuaal', '11535', 'artalartistic', '11519', 'shannonmcbe', '10895', 'winsock', '10850']

If you want to get data from the other columns in the table, that is possible too

EDIT:

Better yet, I was able to get the XHR web request to return the list of top viewers (this is where the data in the table comes from and is in json format): https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top

You can query this and get the data much faster without having to scrape, let me know and I can show how.

EDIT:

Ok, super simple and WAAAAAAY better:

First install requsts:

pip install requests

Then:

import json
import requests

url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'

# get a dictionary of the request's json response
usernames = requests.get(url).json()
print(usernames)

Output:

{'_total': 19350, 'users': [{'username': 'underholderen', 'points': 42051}, {'username': 'jimbyj', 'points': 39220}, {'username': 'delynne', 'points': 35411}, {'username': 'rawrnerunya', 'points': 30350}, {'username': 'simmer5k', 'points': 25470}, {'username': 'bloomspeed', 'points': 23885}, {'username': 'jaidav2000', 'points': 22386}, {'username': 'moobot', 'points': 18910}, {'username': 'virgoproz', 'points': 18120}, {'username': 'ottermandela', 'points': 18108}, {'username': 'v_and_k', 'points': 17945}, {'username': 'kalibxi', 'points': 17610}, {'username': 'commanderroot', 'points': 17585}, {'username': 'jujusan', 'points': 17575}, {'username': 'mellowj', 'points': 15390}, {'username': 'itsvodoo', 'points': 15080}, {'username': 'lord_hal', 'points': 14945}, {'username': 'darkk0ala', 'points': 14757}, {'username': 'sirenmatty', 'points': 13230}, {'username': 'myles_27', 'points': 12725}, {'username': 'upsetpoptart', 'points': 12204}, {'username': 'salsichasensuaal', 'points': 11535}, {'username': 'artalartistic', 'points': 11519}, {'username': 'shannonmcbe', 'points': 10895}, {'username': 'winsock', 'points': 10850}, {'username': 'macklelotsmore', 'points': 10688}, {'username': 'kikyobooty', 'points': 10650}, {'username': 'jovikingdomkey', 'points': 10385}, {'username': 'dancerhands', 'points': 10186}, {'username': 'mapplerug45', 'points': 10185}, {'username': 'lurxx', 'points': 10175}, {'username': 'jellycat101', 'points': 9965}, {'username': 'dean_', 'points': 9880}, {'username': 'tagou_', 'points': 9550}, {'username': 'arthiphix', 'points': 9505}, {'username': 'beingred', 'points': 9307}, {'username': 'theemrmark', 'points': 9135}, {'username': 'tiptactoe', 'points': 8710}, {'username': 'aten', 'points': 8660}, {'username': 'sweegol', 'points': 8630}, {'username': 'taramichellee', 'points': 8625}, {'username': 'sindar44', 'points': 8590}, {'username': 'nitestalkrr', 'points': 8570}, {'username': 'swoapy', 'points': 8546}, {'username': 'logviewer', 'points': 8380}, {'username': 'umental', 'points': 8235}, {'username': 'chesterfield250', 'points': 8171}, {'username': 'theedgecution', 'points': 8152}, {'username': 'dreameater_gd', 'points': 8110}, {'username': 'camirios29', 'points': 7960}, {'username': 'dirty_soul', 'points': 7895}, {'username': 'princesschango', 'points': 7780}, {'username': 'tylerhunsicker', 'points': 7729}, {'username': 'toonybit', 'points': 7655}, {'username': 'angeloflight', 'points': 7515}, {'username': 'fentondy', 'points': 7325}, {'username': 'owgrandma', 'points': 7165}, {'username': 'ohitspb', 'points': 7150}, {'username': 'jayy557', 'points': 7140}, {'username': 'nightbot', 'points': 7125}, {'username': 'therealjt', 'points': 7110}, {'username': 'hawqks', 'points': 6970}, {'username': 'oxsaucy', 'points': 6930}, {'username': 'somoonm', 'points': 6910}, {'username': 'skiesti', 'points': 6890}, {'username': 'adeeduhs', 'points': 6695}, {'username': 'elmolovesdorothy', 'points': 6660}, {'username': 'liquigels', 'points': 6640}, {'username': 'shadowed21', 'points': 6630}, {'username': 'fakerwtd', 'points': 6450}, {'username': 'fragglefusion', 'points': 6440}, {'username': 'kickypip', 'points': 6230}, {'username': 'cerem5', 'points': 6230}, {'username': 'nikkigsus', 'points': 6225}, {'username': 'bigj808', 'points': 6135}, {'username': 'anotherttvviewer', 'points': 6070}, {'username': 'taratv', 'points': 6040}, {'username': 'l0nnix', 'points': 5970}, {'username': 'sainttt', 'points': 5965}, {'username': 'princejay__', 'points': 5905}, {'username': 'oniisammma', 'points': 5886}, {'username': 'marshallpawpatrol', 'points': 5839}, {'username': 'rosayallday', 'points': 5720}, {'username': 'garvsehgal98', 'points': 5700}, {'username': 'beethoven6', 'points': 5695}, {'username': 'nynxii', 'points': 5680}, {'username': 'tilly', 'points': 5672}, {'username': 'godgundam1019', 'points': 5615}, {'username': 'monoclekitteh', 'points': 5605}, {'username': 'steviewondaaa', 'points': 5580}, {'username': 'ianonymoose', 'points': 5545}, {'username': 'aris1535', 'points': 5477}, {'username': 'rimastino', 'points': 5445}, {'username': 'kodexow', 'points': 5395}, {'username': 'ssondara', 'points': 5360}, {'username': 'cyroku', 'points': 5325}, {'username': 'ankoubzh', 'points': 5250}, {'username': 'sajan_ow', 'points': 5205}, {'username': 'plucik7', 'points': 5125}, {'username': 'sutetchi_', 'points': 5108}]}

EDIT (again):

Here is how to get it in excel (code changed slightly from above):

First install openpyxl:

pip install openpyxl

Then run the script:

import json
import requests
import openpyxl as xl


url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'

# get a dictionary of the request's json response
response = requests.get(url).json()

# get just the user list
users = response['users']

# add the index + 1 as rank (because index starts at 0)
for user in users:
    user['rank'] = users.index(user) + 1

# create the workbook
wb = xl.Workbook()

# go to the active sheet
ws = wb.active

# write the header row
ws.append(list(users[0].keys()))

# write the values for each row
for user in users:
    ws.append(list(user.values()))

# save the workbook
wb.save('./streamelements-kappa.xlsx')

Great answer - I'd just note that other people may have a different chromedriver path. — n1c9, May 05 '20 at 03:29
I know, I have a different path, just using the same one OP had. ;) Thanks for pointing it out! made an edit for future viewers — bherbruck, May 05 '20 at 03:30
Thanks I worked perfectly! I want to be able to take the usernames and their corrosponding points and paste them into two columns in an excel spreadsheet, so I'm going to work on that next. Thanks again for the help! — GhostCat, May 05 '20 at 13:43
Thank you! Is there a way to easily amend that script so that the positions (#1, #2, #3 etc) are in excel aswell to the left of their corrosponding username and points? — GhostCat, May 05 '20 at 16:34
@ParsleyPurr of course! I updated the code with this line: # add the index + 1 as rank (because index starts at 0) for user in users: user['rank'] = users.index(user) + 1 — bherbruck, May 05 '20 at 17:43
that site also has a full API reference: https://docs.streamelements.com/reference/giveaways pretty cool stuff — bherbruck, May 05 '20 at 17:48

score 0 · Answer 2 · answered May 05 '20 at 02:35

Probably the class name is not "md-cell leaderboard-row" but "md-cell" what goes after the space is a selector or something like that that I honestly don't really understand that much, since I know almost nothing about CSS.

However, this code should work almost fine:

chrome_path = r"D:/PythonLessons/imageTest/chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
usernames = driver.find_elements_by_class_name("md-cell")
for item in usernames:
    print(item.text)
driver.close()

In that code you get all the md-cells and you'll see a list with all the cells, you can also get the rows by using "md-row" instead of "md-cells" and you'll get a list and each elements is a row containing number, name and points. Give it a try

Ps: you can check after you have the list if the element is empty.

score 0 · Answer 3 · answered May 05 '20 at 03:26

It is because the element you are looking for an element that is part of multiple classes, namely md-cell and leaderboard-row. To fix this, use xpath to find elements in which the elements are part of the md-cell class and the leaderboard-row class:

usernames = driver.find_elements_by_xpath("//*[contains(@class, 'md-cell') and contains(@class, 'leaderboard-row')]")

Be sure to add a sleep if line gets executed before the page is fully loaded

Python WebScraping With Selenium&gChrome

3 Answers3