1

Like the title indicates, I'm currently in a bind. I'm trying to create a personal Software that I could use whenever I'm bored, and that involves webscraping the response of a specific character. I'm scraping the reponse from a website called Character.ai. I'm still new to web scraping so it's kind of hard for me to understand how would i get the response. I tried a sample code just to see if i could see the response on the html-like printed response from the requests module but all i get is:

*
<!DOCTYPE html>

<html lang="en-US">
<head>
<title>Just a moment...</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="noindex,nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/challenges.css" rel="stylesheet"/>      
</head>
<body class="no-js">
<div class="main-wrapper" role="main">
<div class="main-content">
<noscript>
<div id="challenge-error-title">
<div class="h2">
<span class="icon-wrapper">
<div class="heading-icon warning-icon"></div>
</span>
<span id="challenge-error-text">
                        Enable JavaScript and cookies to continue
                    </span>
</div>
</div>
</noscript>
<div id="trk_jschal_js" style="display:none;background-image:url('/cdn-cgi/images/trace/managed/nojs/transparent.gif?ray=7aecf89aef5f1085')"></div>
<form action="/chat?char=Q_cHSEbZkD_x5SDCfxHnjh4mIzakJnzsWfqyCvlev7g&amp;__cf_chl_f_tk=on4BO22Yvyltg44Wg29bSJIR.LgHr3LVvuQ2jSMm_qU-1679976078-0-gaNycGzNCqU" enctype="application/x-www-form-urlencoded" id="challenge-form" method="POST">
<input name="md" type="hidden" value="E9SaqNGYM41RIZBGBI9U8gMPWvUL8tEk3c3i5lKO1rM-1679976078-0-AdpckUSwV_-rR9gUb5Fh-8goSxT3xjuEk0M6LX9qpUMJPawyMkFDm5XCONECIrzIWLlwBAusq9k8pcffCIZZvrQxR08IEAyTXcv4ubCK1D0B_JP93iy_KhdfQmFm1DBnLQuHYHKUEX5vBW6VF3Xi0v712USxb3850uFNJQJO3-ltFN-5AWFrALEIcqGKfERdCeLXCZ37FtyyYHcjtTV-3JFOUMrocPD_Ej5hlwK0fznlG55J7tunkypjp9EQk3U4BE0qnIRtNp6gbkMymzsQUzeY1FPDeW8ufmGmEPQq3K_Tn9mmkfEeXZp_yi2Yc9NwTs3uDQxWsbidTMH1sasNF2h1XCR5jV9DDnEbFF5aYFMfzEZakcQS5k2dhTG3W429dbnYHkWcBwGDQD1rVJaY4R4BxfN9NWjDdbMM3wYhqBr7I6OdgmiWFYCGS9joT4BC2uOObK4e2WmvxBxjESkwf9bEhv5ctY2y4qX9_dBgaq-23gXlKhmBQJ7-2eiCD756LPrbjoX_7Ae1tJo0TRlnDn9h-oja5kwcT5n5aumqJnWgJhyLiNo0n_QWzs26UeyyXjzZkzOnvJhQE3vlS1WwkA99vedsqsSqLO9NOq987EnbKDRg4rqES673uqWcqVxZ5xae5oeP6gUH4gMzREBdK4FRhOP6ie33IA24BPz8BF-HU0TqQ69ehWM6Sgg4oUo8NIPzRU8T14qx0k47hL6U7xlDyx5N_QV0FPNuRz1Z_PaINKFR2J_w5X9PQXVAavp-iZ5eHA_r2Vj5A3nZZLo-Qx_stpVfRAQufDWNuhrAm3AEjJV4dclxCFZ6Vw-0Sa6psCRvIxpMPJMXQsyQ_vR9I9S-1Q0LF95FSt2IjfGe2uEQRXxlnpyrtgvKpdoskYm_vTv1EwwF6hJfqDN2N89mWSgbVjlQtb1SMAuoCj2rY62jq4X1t-ML6-vH4RcPCTtWnKV9Ub_U1wdt9aCfVDtvMiIVluxdJUkFAKJ9HIfPSWHAdM2agDVozA88O6tEptqRG20iLVpCEdMvmIwMW8ospq-yH1ED-mHlNWCBVCibSwucZJiKjZjhRw3COtGttrv4r35ZBJ8_4cU1N08Pa6OswWNJpuVEi70-3iZJJK8rdQVoTfZ_ZHGUjfaunS0aDUFbaRsSdMVbwekjgQq97d9IJoxzHjaBngGrkCm2Re3XnD2LRJ-DirRwLUK8TrHbWJisagLjdkc4sEMDg32fj0Mbpg5bgSLgsxwoF9qUaLXH7o9xI-3yhg2ISRzfsl0icCQ_fM_yV3P9cXnwl7w6sHL3lA8eX9sa1HMx_Kna-xi9y8SsuGFYZAr1qGUCW7K8lPjAPe_k1h1ynbZIASPoT11rV6fk1MIabhKMEgLDBv6s96waoPhNEDl6qSUnEWcSAv6lqrjh5-DcIViuyMpspdWaDjhzIogxkoJ4JZsxrnIBFAn_63qs_2rBPYxiJONc58Z_nZucR_2tE2xIRvhwoGZtj7AIKeAF3XEAsnUO79sEPonWTMzLhiVa3VyDusORCOFtUUP86YH-T0uXPliq4YYqQyXqsmYUMSZsLE-GTwpj2q_cwTUFaqRjVaBNJn8qNHs0nua4eFwwz1ivIfmoBmrZiUxkVptHSwbqvqEZ3-Bgv7d0-28RF4U-1oVkcPdlCiCC2tR6mEwIFZF9ACWug2LioenZEKLGGDlu9y_QwoP9EzLi_XxSdPlEsDBOrsv_Bygbu7N_HMeX1DuU6TW6QIoq3Tdwwhbt52D0uhVXsO2Tm7QvR1NMwLKNj6mwuiDUA9WdbPluFw_6XiqQ3PLRlfILz5yps4QCFMZAehOiQBkM3M8Z4aQe3mFVvp4uqsacqes3oRF174WK0qhJ6c2WKqaPRDqwabcUrhWcwipZmR9CwT76m74E6BZzOrbZYakBivU0e4t4xLEfZYkCxuODp3Z4UOXiGM-6sddEpvCzNQxqbX8ysdno3Rq_G4pONq_RMNTA-I8Cd8l4-5UtIQTjzUYTqb0-cSHqldO3ja9q8csYJKNnjLMZgnOq2PrsnxejL23I3lErktaS_wjcUbPP6cbW4bA5IHqFZmnCKAXt621M9xNVETbNzfuVwGPkjKU1pQgYi743clJZeM3Y8mjhVr86KhkRaq_UnoIC0MhEDuTzPMDJ5CDTlWvHDipPrx7zfS6iKlc4Z-PJ6uPCyH8di99FNQlC7vjX9kj8vZJn91Fz8LI4QOKpznifpvMdVt7xXyAeBQUQ6CwnM2igK7oFqTCPmmJ5gq5ctdkhyXxWbwHKijjEwmaznkFO1lTQ4k8_g3cdavnwrCNkNTkx6tlX1m74FYl-vaN2pX6CvPa040ar0Aks7wRQXOEl0jO8pumkopa6cZaUreaRZLKQuQxeWXhxcFzd6hCqsg3QzlqckMeOgq8khAF_-TGmSpxI005a4GQOmy5GdUW18NinV661-FpouQ4_4CE1p8YO7vzecJcBkoY1u93x9fxFPi-B0KuPsYv6E-I0tZKEK2Bzw9Zzcjk9N-Df_vnLi3IOzE10tX7fGjcZY299vl0PR_P5LlCWz-vVxseHVugAsxTF3ulKagdz2cw1a88BZwatE-Nsq8NfzCaN1tSi6WFcz2TPr70W4QbTvq9okTSGEKJIm-KNMgzpS1euLNbDJPDuFFZpJCGu5Vw4"/>      
</form>
</div>
</div>
<script>
    (function(){
        window._cf_chl_opt={
            cvId: '2',
            cZone: 'beta.character.ai',
            cType: 'managed',
            cNounce: '23595',
            cRay: '7aecf89aef5f1085',
            cHash: '47366ac613ffb5a',
            cUPMDTk: "\/chat?char=Q_cHSEbZkD_x5SDCfxHnjh4mIzakJnzsWfqyCvlev7g&__cf_chl_tk=on4BO22Yvyltg44Wg29bSJIR.LgHr3LVvuQ2jSMm_qU-1679976078-0-gaNycGzNCqU",
            cFPWv: 'b',
            cTTimeMs: '1000',
            cMTimeMs: '0',
            cTplV: 5,
            cTplB: 'cf',
            cK: "",
            cRq: {
                ru: 'aHR0cHM6Ly9iZXRhLmNoYXJhY3Rlci5haS9jaGF0P2NoYXI9UV9jSFNFYlprRF94NVNEQ2Z4SG5qaDRtSXpha0puenNXZnF5Q3ZsZXY3Zw==',
                ra: 'cHl0aG9uLXJlcXVlc3RzLzIuMjguMg==',
                rm: 'R0VU',
                d: 'ifi09TmR2JCiJ4SNN2GnJzMjoOgwucf7gVD0as20uOZGsyF3FNiuawFk32ZHDNJPi6fO9wyNQsgsKGhivedLwhVgtRJInAiKemMPos8Na76wAyG3PM5JuvktpO6U3H1IXo2UW4Eh5bCg/nKHBglsVPJuik9LuF/L7BFKPq7T+quf0e1uGbu9ehO5gMkP6v5qW67ZgTtTn9CLuLBO1MZ4nUgoey/UTyl9Vjfm7O+5lgk09MsfSkENhcw1E3TChqCA7WY1rlwTKRAvgDw9bzmEjWrdoO8ddperajl5c/97qP8gc88gWe5JUpIQsIWxPVFDq1TeCmcagjNIC04jKeWaWvSTl7PzfD2b1wgPzfNkO/PnkdViujgtbQWISin0ToLnmWhYMSmjvHLepR/HGb8o7/PfRJ8d7a33gy3XqfMyAnnFEtJEqybJt4xXzucfXICZSgrb26BDfS+YTIKqggC23+Q7ts+1exkSzU2oqLS+LJj7U3Cnk5gRnAFg5zvaDBFQUpMqLrxyx3jsEki+oY6Cjn/uLC2iIqUTK17roeXvAbXp4Biq+Qzxk07TUp7+XQL/TWZP/aq5Xvw3RjqKgHnGoA==',
                t: 'MTY3OTk3NjA3OC41NDcwMDA=',
                m: 'XzGo8GcEPUjGm0bIBVpDAncON1ZTZImwID/g1KiF7UA=',
                i1: 'mBh5M43XhF+uRgHhRlKJuw==',
                i2: 'Gf0Tv9IY6hO/HH2aYN6dHg==',
                zh: '3MzEEr4KLflmeMqd8UHKRwLX7AlBIHlHXAkcvB3KPTI=',
                uh: 'neMFmDCz0LRc0/Wea+5x0IhJieHbzcQVSkMgFkgnrjc=',
                hh: 'YfIreubNI/6hc4NG9htqteP1V6Jt4WFaxTZZalkN138=',
            }
        };
        var trkjs = document.createElement('img');
        trkjs.setAttribute('src', '/cdn-cgi/images/trace/managed/js/transparent.gif?ray=7aecf89aef5f1085');
        trkjs.setAttribute('alt', '');
        trkjs.setAttribute('style', 'display: none');
        document.body.appendChild(trkjs);
        var cpo = document.createElement('script');
        cpo.src = '/cdn-cgi/challenge-platform/h/b/orchestrate/managed/v1?ray=7aecf89aef5f1085';
        window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
        window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;       
        if (window.history && window.history.replaceState) {
            var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
            history.replaceState(null, null, "\/chat?char=Q_cHSEbZkD_x5SDCfxHnjh4mIzakJnzsWfqyCvlev7g&__cf_chl_rt_tk=on4BO22Yvyltg44Wg29bSJIR.LgHr3LVvuQ2jSMm_qU-1679976078-0-gaNycGzNCqU" + window._cf_chl_opt.cOgUHash);
            cpo.onload = function() {
                history.replaceState(null, null, ogU);
            };
        }
        document.getElementsByTagName('head')[0].appendChild(cpo);
    }());
</script>
</body>
</html>*

from this code:

import requests
from requests.auth import HTTPDigestAuth
import bs4

URL = 'https://beta.character.ai/chat?char=Q_cHSEbZkD_x5SDCfxHnjh4mIzakJnzsWfqyCvlev7g'
page = requests.get(URL)

Soup = bs4.BeautifulSoup(page.content, 'html.parser')

I want it to return me the actual reponse of the ai. How should I do that????

Koshi
  • 31
  • 4

1 Answers1

0

So, the general problem here is that the page is generated dynamically with javascript. When you load the page in Python, it does not run the javascript, so you can only see the initial state of the webpage.

One thing that you can try is to look at the requests the page is making behind the scenes. You can use the browser development tools for this. It is possible that there is a simple API that the website is using to get the chat content.

A quick look for existing stackoverflow posts on the topic yielded this answer which may provide further insight into the problem

Raymi306
  • 105
  • 8