0

I have looked everywhere to find a solution(Including old stackoverflow posts of related issues) to remove javascript not available as output, it gives this for dynamic sites so I decided to use selenium instead of requests library and I still get the same issue. Anybody know how to fix this issue so its possible to scrape dynamic sites. I simply want to retrieve the text from dynamic sites. I've exhausted all ways to find a solution below is my code feel free to add or recommend a solution.

Console output: JavaScript is not available. We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center. Help Center

Below is my code:

import time

from selenium import webdriver

from bs4 import BeautifulSoup

browser = webdriver.Chrome('chromedriver')

options = webdriver.ChromeOptions()

options.headless = True

options.add_argument('--enable-javascript')

options.add_argument("--headless")

browser.get("https:/www.twitter.com/")

time.sleep(2)

html = browser.page_source

soup = BeautifulSoup(html, 'html.parser')

L = soup.getText()

time.sleep(2)

print(L)
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
jarvis
  • 1
  • 3

2 Answers2

0

Javascript is enabled in all browsers by default unless you have explicitly disabled it. In this usecase it seems Selenium driven ChromeDriver initiated Browsing Context is getting detected as a

However, I was able to retrieve the Page Source using a few tweaks as follows:

  • Code Block:

    options = Options()
    options.headless = True
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get("https:/www.twitter.com/")
    print(driver.page_source)
    
  • Console Output:

    <html dir="ltr" lang="en-GB" style="overflow-y: scroll; overscroll-behavior-y: none; font-size: 15px;"><head><style>input::placeholder { user-select: none; -webkit-user-select: none; }</style><style>@font-face {
      font-family: TwitterChirpExtendedHeavy;
      src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff) format('woff');
      font-weight: 800;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff) format('woff');
      font-weight: 400;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff) format('woff');
      font-weight: 500;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff) format('woff');
      font-weight: 700;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff) format('woff');
      font-weight: 800;
      font-style: 'normal';
      font-display: 'swap';
    }</style><meta charset="utf-8">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,viewport-fit=cover"><link rel="preconnect" href="//abs.twimg.com"><link rel="dns-prefetch" href="//abs.twimg.com"><link rel="preconnect" href="//api.twitter.com"><link rel="dns-prefetch" href="//api.twitter.com"><link rel="preconnect" href="//pbs.twimg.com"><link rel="dns-prefetch" href="//pbs.twimg.com"><link rel="preconnect" href="//t.co"><link rel="dns-prefetch" href="//t.co"><link rel="preconnect" href="//video.twimg.com"><link rel="dns-prefetch" href="//video.twimg.com"><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js" nonce=""><meta property="fb:app_id" content="2231777543">
    .
    .
      <noscript>
        <style>
        body {
          -ms-overflow-style: scrollbar;
          overflow-y: scroll;
          overscroll-behavior-y: none;
        }
    
        .errorContainer {
          background-color: #FFF;
          color: #0F1419;
          max-width: 600px;
          margin: 0 auto;
          padding: 10%;
          font-family: Helvetica, sans-serif;
          font-size: 16px;
        }
    
        .errorButton {
          margin: 3em 0;
        }
    
        .errorButton a {
          background: #1DA1F2;
          border-radius: 2.5em;
          color: white;
          padding: 1em 2em;
          text-decoration: none;
        }
    
        .errorButton a:hover,
        .errorButton a:focus {
          background: rgb(26, 145, 218);
        }
    
        .errorFooter {
          color: #657786;
          font-size: 80%;
          line-height: 1.5;
          padding: 1em 0;
        }
    
        .errorFooter a,
        .errorFooter a:visited {
          color: #657786;
          text-decoration: none;
          padding-right: 1em;
        }
    
        .errorFooter a:hover,
        .errorFooter a:active {
          text-decoration: underline;
        }
    
          #placeholder,
          #react-root {
            display: none !important;
          }
          body {
            background-color: #FFF !important;
          }
        </style>
        <div class="errorContainer">
          <img width="46" height="38" srcset="https://abs.twimg.com/errors/logo46x38.png 1x, https://abs.twimg.com/errors/logo46x38@2x.png 2x" src="https://abs.twimg.com/errors/logo46x38.png" alt="Twitter" />
          <h1>JavaScript is not available.</h1>
          <p>We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Centre.</p>
          <p class="errorButton"><a href="https://help.twitter.com/using-twitter/twitter-supported-browsers">Help Center</a></p>
        <p class="errorFooter">
          <a href="https://twitter.com/tos">Terms of Service</a>
          <a href="https://twitter.com/privacy">Privacy Policy</a>
          <a href="https://support.twitter.com/articles/20170514">Cookie Policy</a>
          <a href="https://legal.twitter.com/imprint">Imprint</a>
          <a href="https://business.twitter.com/en/help/troubleshooting/how-twitter-ads-work.html?ref=web-twc-ao-gbl-adsinfo&utm_source=twc&utm_medium=web&utm_campaign=ao&utm_content=adsinfo">Ads info</a>
          © 2022 Twitter, Inc.
        </p>
    
        </div>
      </noscript>
      .
      .
      <script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js"></script><script nonce="">(function () {
      if (!window.__SCRIPTS_LOADED__['main']) {
        document.getElementById('ScriptLoadFailure').style.display = 'block';
        var criticalScripts = ["polyfills","vendors~main","i18n","main"];
        for (var i = 0; i < criticalScripts.length; i++) {
          var criticalScript = criticalScripts[i];
          if (!window.__SCRIPTS_LOADED__[criticalScript]) {
            document.getElementsByName('failedScript')[0].value = criticalScript;
            break;
          }
        }
      }
    })();</script><script nonce="">document.cookie = decodeURIComponent("gt=1502387523636527105; Max-Age=10800; Domain=.twitter.com; Path=/; Secure");</script><script src="https://accounts.google.com/gsi/client" id="googleGSILibrary" async="" defer=""></script><script src="https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js" id="signInWithAppleJsLibrary" async="" defer=""></script></body></html>
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Your URL is incorrect, it should be https://twitter.com/

Twitter uses bot detection technology, and when you use selenium it searches for some data about the browser.

Basically, all you need is to change the cdc_ string in the driver.

There is a link to the same question: link

Bauyrzhan Ospan
  • 300
  • 2
  • 4