TL;DR
The big guns of selenium
can't shoot the Cloudflare sheriff.
The Colab link with what's below.
All right, here's a working selenium
on Google colab that proves my point in the comment that even if you run it, you still must deal with a Cloudflare
challenge.
Do the following:
- Open a new colab Notebook
- Run the code below:
%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap
# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF
# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF
# Install chromium and chromium-driver
apt-get update
apt-get install chromium chromium-driver
# Install selenium
pip install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
driver = webdriver.Chrome(options=options)
driver.get("https://clutch.co/it-services/msp")
print(driver.page_source)
driver.quit()
You should see this:
<html lang="en-US" class="lang-en"><head>
<title>Just a moment...</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="robots" content="noindex,nofollow">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="/cdn-cgi/styles/challenges.css" rel="stylesheet">
<script src="/cdn-cgi/challenge-platform/h/b/orchestrate/managed/v1?ray=7daf435aeecd112d"></script><script src="https://challenges.cloudflare.com/turnstile/v0/b/19ad4730/api.js?onload=_cf_chl_turnstile_l&render=explicit" async="" defer="" crossorigin="anonymous"></script></head>
<body class="no-js">
<div class="main-wrapper" role="main">
<div class="main-content">
<h1 class="zone-name-title h1"><img src="/favicon.ico" class="heading-favicon" alt="Icon for clutch.co">clutch.co</h1><h2 id="challenge-running" class="h2">Checking if the site connection is secure</h2><div id="challenge-stage"></div><div id="challenge-spinner" class="spacer loading-spinner" style="display: block; visibility: visible;"><div class="lds-ring"><div></div><div></div><div></div><div></div></div></div><div id="challenge-body-text" class="core-msg spacer">clutch.co needs to review the security of your connection before proceeding.</div><div id="challenge-explainer-expandable" class="hidden expandable body-text spacer" style="display: none;"><div class="expandable-title" id="challenge-explainer-summary"><button class="expandable-summary-btn" id="challenge-explainer-btn" type="button">Why am I seeing this page?<span class="caret-icon-wrapper"> <div class="caret-icon"></div> </span> </button> </div> <div class="expandable-details" id="challenge-explainer-details">Requests from malicious bots can pose as legitimate traffic. Occasionally, you may see this page while the site ensures that the connection is secure.</div></div><div id="challenge-success" style="display: none;"><div class="h2"><span class="icon-wrapper"><img class="heading-icon" alt="Success icon" src=""></span>Connection is secure</div><div class="core-msg spacer">Proceeding...</div></div><noscript>
<div id="challenge-error-title">
<div class="h2">
<span class="icon-wrapper">
<div class="heading-icon warning-icon"></div>
</span>
<span id="challenge-error-text">
Enable JavaScript and cookies to continue
</span>
</div>
</div>
</noscript>
<div id="trk_jschal_js" style="display:none;background-image:url('/cdn-cgi/images/trace/managed/nojs/transparent.gif?ray=7daf435aeecd112d')"></div>
<form id="challenge-form" action="/it-services/msp?__cf_chl_f_tk=fodu0hgxQaVwZPaQ.DlaPwI.O5svWinEWf94LM_MirI-1687382086-0-gaNycGzNCxA" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="md" value="Y531CK3.GDorU7Iwk2DV6cF23mTV48icLTjuAZV6568-1687382086-0-AY7Fiv3qUhkh_i93AsbTcYh_D3SG2ZegyiWzGIVG8NgRvrQkiLAuCZ_x8rfr_A4Wy5QOyAOrBLs-avkoeJD0_1G3AYtVfv9rIc6umkp5J_y75TurwQH5fCwjSC3biYbFJbdTbW_NeKfRDUQgh230Lb1UMApiygfWXkeMlzznEEKUa3EXALaHU6co68L5nf_vY6c9QyyILeTdhcspjfUkXCUIUB7ff-8QQgCKpUkZa3UH9V9Icbndie4LGMCl_QJsy5jPvIzTt7nAS_Kk1-TrPxZltr8ZyHhjdvhEyVTkyrTi46auFGmixnyt9bK5dKnGv-J59nXp3EMF34gnVnbmTQuMDG9KHaN4bR4Ij6IO94sRnDGIJnXX6aiFLHiqFx9_kh1krAg3qOuuXZ9UghjKoITy2uPx9ng7hZ73p6QILb0aW-f-GL4VBdv-f1mdZyXJYRRrlfpnGoQMy-jxy6zsZshYtI-fzuDAL3A7nU_NVEGoN7SRrS4dFdn2mGhwPwVhhzt37SQ04MMjfs-_r8KNkOVbnNBtfHp_TWwyEbrhM4Lgc-YEYVRrI-J5LVYwIv4K7JAgObKJffhs53zwB0RrFQG3pF2Qy9W8Cxq2HvlKko3clzUXmw6meZfYJPZaYIMbJa39rqF0jltNKoqOcgJa5xQSTSXrNShUO1ClAHsjUGuTA11lM8Dk5rlnS9qXVWhDWI51i-4Q7BPIkb1BqaW6K_0ltyCzXBtN8q1EqrJeno7ryMC1FyCZ2y8Hy0IsHAhNg2DAvhYov34mrEeoOc4iG4ZHZghGAPkf9tNXo5NBTVNbrwDzvwxXaMVWJRHYQ8YB6LiFK7VPWa_ZjEU7GsdWzXpa_Tp4ulnnbUGrdEThXQC3chCij4f3T7m-Pc7LZdTvs-qs2f5g6_kBwiAAro2KelOxhCsf66l5HcpHHy9uhERBx7FgItODQDqG7kR2r80QCo3kOzBqFL3CIsvtg_KYNG8HkxYqDc-YMRWsvBj5Mmt6c8RzCOkDxKC_DJwOj58CeC2o9e-6wCfgcjb0EPR8cTK_S8ht28zPLUCDJ_j119ErBnHJ1zpdJHydT1HEdnK-vaSuyYf69kOSCC7Kij4ZRttSlfiA4k9gau8QoREht_pxMwfxXraBRfYUWVXO_ZSyz561B9C4Fa1L0gW31RXgCRuzCdDg-Cgr9AN8ky06s19D3N4CZLhtGOjRfMbidHVBD9Ppe4jlcUnSx-wdkJkVXZ2S8XO4F4ou7jGhrN9l9mDIDZ98OXaL_CvhHXNBWxE1Gn1_i1_Ndb7VKFP5Y6YuPLTXaN9kS-kF3rZcIBuh_dczTVQKOEWq1QYy9_CBj2sIPSxhcuQCXwTt4K81e6UiIrovBNWiZ4VjKvLdetwmUUgnpfNbssOz5S6GieV7ENqMBdaYlIP9YPdzHdJl4WQ_stCiC_Yc0wew2XI2XvOOil8_7F1yHgCg4mPS98Y9BXNDKiLDGGl3lRs9ydBvCdiY8__KztFLuVyDiWqschUvXUOg07KBtyQDnSxOyZUn873i7Kg4dKoqAyUICRT_nhsNtGUe4wzXYk3eevEG-7Ct4tSBpw6rTrjeNqa9Lsu5b6Pv-eJX0gYpg-1pydKSKLfvQYNp9wjwT-Oh5UH8vw8lo7b3uSc6QMmkaP2jQVDnqIyQDN8cDAYu6Vdr83xiZJG1Qqn80xVe0RMwEzMcjFv7yy6QM3O-uv0tJHC8EnINpXc1uMp1zphYyIgw-xSy68x55DEf38OrsY7xbJUqdMdF_qJQPi3FOh5MYHftgyH1WyDUHrxXiVJYuTMv7DtgaLjGoA0ybDW_PcBOXI5LAXnqYYR92WmHTEghxLHKxpWqZt9t_XS4j4rycqHU261_6zPhkTklv2cUFJOOT5lRTkY3OySP7-CEp0ZgjPrAOu4g-wt1YUprDjQzYrpmlBUXqKXzeJ795UBKn0HZLDoGQkY5_w-deyzcLV4XZXdGrxnAEOQq5Kx330hD2XgH8Q0be4WinLLZ6R8Tsl3c_5UuxLn0YJlxosFgXXLZehemg9WxGzfrOnb_5reyNr_3KU4nYWl9wFy-wsz6HtyPQ_1LnvBBgxVbrCFy-m9Wm8mt1BcaLwTUA2NSpTY0fbSwkuvx0LKTmG865H5C9qqBAgTGw2R99fv6vqq6ZP_HOzv5Q-c5L2C17lCp4cJwOCkvj7NEWQ1iCoi0X6CWZtVYFC-wXeuI4dh2D6BxGtekFuC77-Rt335ib1wPN7bf6_lA-TPb92U2IUCoq8K9frexCE7QzxaCSdKB-wRkE6g5FERuP-waii1Uquiut4aQ8tJVlwvi0nvuOvuP_Rg4P9xa2HlOxzwajrDBmzfnerhwdyEzOYQTXvwDF-ApPg5rPiMpjo29icy0K9arOF9yY_Wf2EXZD-6hjCcDswfhO3lQWFfnf1ANOFvnp0hcCvr-k93ukAnVbm8uorhSIWr2iy1JeNeGH8kM66IDkdlSLnj9igHNf6C0vDnkBoOolfXQECpmfhS6dai7Np01RQjoKsoGQU1S4rQnjsdYxBdOXqdrfYw_wfsBhV87qxHGUND6uD6m3qwU2vKCyQa_GSIGgzPfqWnhpXozyHUbmBOYJDiKI6u0x3u8mZDWhaaQWYttxUa1gQKnOQy1qM5NI8D881kXI_M2cpvX4rW9coG1k9_qE7yC--4u537ojssm9gSzNnQgeOpn-___N978hwxMqftej1jdhMJePK959TjaeJvMu045n-xtFbFGF81FIhiKtMWskbvRy1wIB3I">
<span style="display: none;"><span class="text-gray-600" data-translate="error">error code: 1020</span></span></form>
</div>
</div>
<script>
(function(){
window._cf_chl_opt={
cvId: '2',
cZone: 'clutch.co',
cType: 'managed',
cNounce: '44156',
cRay: '7daf435aeecd112d',
cHash: 'f86e351e5e00345',
cUPMDTk: "\/it-services\/msp?__cf_chl_tk=fodu0hgxQaVwZPaQ.DlaPwI.O5svWinEWf94LM_MirI-1687382086-0-gaNycGzNCxA",
cFPWv: 'b',
cTTimeMs: '1000',
cMTimeMs: '0',
cTplV: 5,
cTplB: 'cf',
cK: "",
cRq: {
ru: 'aHR0cHM6Ly9jbHV0Y2guY28vaXQtc2VydmljZXMvbXNw',
ra: 'TW96aWxsYS81LjAgKFgxMTsgTGludXggeDg2XzY0KSBBcHBsZVdlYktpdC81MzcuMzYgKEtIVE1MLCBsaWtlIEdlY2tvKSBIZWFkbGVzc0Nocm9tZS85MC4wLjQ0MzAuMjEyIFNhZmFyaS81MzcuMzY=',
rm: 'R0VU',
d: 'Ac34gEYVhl8DXbnILOq76p8yhzcHr06ria7SjaltDZ17DDHJrhCowkieLnLjzsxr3IgprB+0nJObDfv3tbOFZfQanW8VrnMBqy2JC8EFTBSXy7ra08EgPGOSUetaRr/bENIZ81mt06Vq52ykJX01fCO0wyHdNMat8fNwgF9RDfp7CFMpUtp0E+lofrj9tut74nR1+yniOo1zFt2zmKVpFFUunX1K1oMy8Fp1ubIQgHIBEG8g8h3CRzHD2WMTRtqYfFvCfD5PhcR+uWWgxf6ybQnii3noC7BLSbJZHZ5abVjNKZTvRGyLtkP8uNLoAQTF8A5ir68vmv+c6weSVw845TjogSfOFzHrXQvj5dnpPWEmReEsQfl2p3nJJuswyd/OUIPTMuLfPOM7EYHQKawKqI1+jp15e4QZjAl4LIhAwQoHqqcXPd9NqvBkzxrb7YhWBsvOHzgUMb5gR3exN42NVnFbUimWWdhX7Ei+tXR43I+68kGLFe4kQccvXzfYtl3G7mudbXvhkFMjAJk24bb9ugax1RyJeT1HMXZAZG7vOzGxEpf2Zgly+6twZ+C1JShkmfbHj9Z8EkYIlkxm99wVFg==',
t: 'MTY4NzM4MjA4Ni44NzAwMDA=',
cT: Math.floor(Date.now() / 1000),
m: 'eRBgvpMHb6ottjHZ8LYOdoe7cvhlOKe5j2vP7BjQYIE=',
i1: 'DrqvOBUgqLvl22W0Yoh8VA==',
i2: 'Co7rIFnUzVj/9LmqAUCUUw==',
zh: 'MYPZaDt93/n+i/zoik8Q5B4rNo75M88ZQHevg31AJek=',
uh: 'U3QjejX60yUnAxm0WjPwFsHXm0FG5VD2yNoc1w8iQek=',
hh: 'w+icDAWoSjxex064a5CZutpetBiSACwcZG4EmfuqjNI=',
}
};
var trkjs = document.createElement('img');
trkjs.setAttribute('src', '/cdn-cgi/images/trace/managed/js/transparent.gif?ray=7daf435aeecd112d');
trkjs.setAttribute('alt', '');
trkjs.setAttribute('style', 'display: none');
document.body.appendChild(trkjs);
var cpo = document.createElement('script');
cpo.src = '/cdn-cgi/challenge-platform/h/b/orchestrate/managed/v1?ray=7daf435aeecd112d';
window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;
if (window.history && window.history.replaceState) {
var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
history.replaceState(null, null, "\/it-services\/msp?__cf_chl_rt_tk=fodu0hgxQaVwZPaQ.DlaPwI.O5svWinEWf94LM_MirI-1687382086-0-gaNycGzNCxA" + window._cf_chl_opt.cOgUHash);
cpo.onload = function() {
history.replaceState(null, null, ogU);
};
}
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
</script><img src="/cdn-cgi/images/trace/managed/js/transparent.gif?ray=7daf435aeecd112d" alt="" style="display: none">
<div class="footer" role="contentinfo"><div class="footer-inner"><div class="clearfix diagnostic-wrapper"><div class="ray-id">Ray ID: <code>7daf435aeecd112d</code></div></div><div class="text-center" id="footer-text">Performance & security by <a rel="noopener noreferrer" href="https://www.cloudflare.com?utm_source=challenge&utm_campaign=m" target="_blank">Cloudflare</a></div></div></div><span id="trk_jschal_js"></span></body></html>
As you can see, running selenium
doesn't change much.
So, my question to you is:
Why do you want to stick to colab so badly?
Because, running a slightly modified code locally:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
options = webdriver.EdgeOptions()
options.add_argument("--window-size=1920x1080")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")
browser = webdriver.Edge(options=options)
browser.get("https://clutch.co/it-services/msp")
print("Waiting for download links to appear...")
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".infobar__counter"))
)
css_selector = ".directory-list div.provider-info--header .company_info a"
download_links = [
link.get_attribute("href") for link
in browser.find_elements(By.CSS_SELECTOR, css_selector)
]
print(download_links)
Should open a browser window (this time is Edge) and return this:
['https://clutch.co/profile/empist', 'https://clutch.co/profile/sugarshot', 'https://clutch.co/profile/veraqor', 'https://clutch.co/profile/vertical-computers', 'https://clutch.co/profile/andromeda-technology-solutions', 'https://clutch.co/profile/betterworld-technology', 'https://clutch.co/profile/symphony-solutions', 'https://clutch.co/profile/andersen', 'https://clutch.co/profile/blackthorn-vision', 'https://clutch.co/profile/pca-technology-group', 'https://clutch.co/profile/deft', 'https://clutch.co/profile/varsity-technologies', 'https://clutch.co/profile/techprocomp', 'https://clutch.co/profile/vintage-it-services', 'https://clutch.co/profile/imagis', 'https://clutch.co/profile/xiztdevops', 'https://clutch.co/profile/parachute-technology', 'https://clutch.co/profile/blackpoint-it', 'https://clutch.co/profile/exigent-technologies', 'https://clutch.co/profile/xenonstack', 'https://clutch.co/profile/it-outposts', 'https://clutch.co/profile/integris', 'https://clutch.co/profile/techmd', 'https://clutch.co/profile/total-networks', 'https://clutch.co/profile/applied-tech', 'https://clutch.co/profile/alpacked', 'https://clutch.co/profile/bit-bit-computer-consultants', 'https://clutch.co/profile/framework-it', 'https://clutch.co/profile/britenet', 'https://clutch.co/profile/success-computer-consulting', 'https://clutch.co/profile/cyberduo', 'https://clutch.co/profile/bca-it', 'https://clutch.co/profile/britecity', 'https://clutch.co/profile/designdata', 'https://clutch.co/profile/ascendant-technologies-0', 'https://clutch.co/profile/ripple-it', 'https://clutch.co/profile/tpx-communications', 'https://clutch.co/profile/xvand-technology-corp', 'https://clutch.co/profile/sikich', 'https://clutch.co/profile/cloudience', 'https://clutch.co/profile/mis-solutions', 'https://clutch.co/profile/real-it-solutions', 'https://clutch.co/profile/arium', 'https://clutch.co/profile/intetics', 'https://clutch.co/profile/gencare', 'https://clutch.co/profile/innowise-group', 'https://clutch.co/profile/tech-superpowers-0', 'https://clutch.co/profile/spd-group', 'https://clutch.co/profile/juern-technology', 'https://clutch.co/profile/turrito-networks']
On the other hand, locally, you don't even need selenium
if you have cloudscraper
.
For example, this:
import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper()
source = scraper.get("https://clutch.co/it-services/msp")
css_selector = ".directory-list div.provider-info--header .company_info a"
links = [
f'https://clutch.co{anchor["href"]}' for anchor in
BeautifulSoup(source.text, "html.parser").select(css_selector)
]
print(links)
Should return:
['https://clutch.co/profile/empist', 'https://clutch.co/profile/sugarshot', 'https://clutch.co/profile/veraqor', 'https://clutch.co/profile/vertical-computers', 'https://clutch.co/profile/andromeda-technology-solutions', 'https://clutch.co/profile/betterworld-technology', 'https://clutch.co/profile/symphony-solutions', 'https://clutch.co/profile/andersen', 'https://clutch.co/profile/blackthorn-vision', 'https://clutch.co/profile/pca-technology-group', 'https://clutch.co/profile/deft', 'https://clutch.co/profile/varsity-technologies', 'https://clutch.co/profile/techprocomp', 'https://clutch.co/profile/vintage-it-services', 'https://clutch.co/profile/imagis', 'https://clutch.co/profile/xiztdevops', 'https://clutch.co/profile/parachute-technology', 'https://clutch.co/profile/blackpoint-it', 'https://clutch.co/profile/exigent-technologies', 'https://clutch.co/profile/xenonstack', 'https://clutch.co/profile/it-outposts', 'https://clutch.co/profile/integris', 'https://clutch.co/profile/techmd', 'https://clutch.co/profile/total-networks', 'https://clutch.co/profile/applied-tech', 'https://clutch.co/profile/alpacked', 'https://clutch.co/profile/bit-bit-computer-consultants', 'https://clutch.co/profile/framework-it', 'https://clutch.co/profile/britenet', 'https://clutch.co/profile/success-computer-consulting', 'https://clutch.co/profile/cyberduo', 'https://clutch.co/profile/bca-it', 'https://clutch.co/profile/britecity', 'https://clutch.co/profile/designdata', 'https://clutch.co/profile/ascendant-technologies-0', 'https://clutch.co/profile/ripple-it', 'https://clutch.co/profile/tpx-communications', 'https://clutch.co/profile/xvand-technology-corp', 'https://clutch.co/profile/sikich', 'https://clutch.co/profile/cloudience', 'https://clutch.co/profile/mis-solutions', 'https://clutch.co/profile/real-it-solutions', 'https://clutch.co/profile/arium', 'https://clutch.co/profile/intetics', 'https://clutch.co/profile/gencare', 'https://clutch.co/profile/innowise-group', 'https://clutch.co/profile/tech-superpowers-0', 'https://clutch.co/profile/spd-group', 'https://clutch.co/profile/juern-technology', 'https://clutch.co/profile/turrito-networks']
PS. The source for the Debian magic on colab is here.