0

I want to scrape so-called TC-strings (Tracking Consent Strings) from sites that implement the so-called Tracking Consent Framework for GDPR compliance.

In the browser console I can enter something like

window.__tcfapi('ping',2,function(data,success){console.log(data);})

The description of the API can be found here TCF API. The third argument is the callback function to write the received data to the console. The desired return value (he example is from focus.de) would be a dictionary that looks like this

{
"cmpId": 6,
"cmpVersion": 1,
"gdprApplies": true,
"tcfPolicyVersion": 2,
"cmpLoaded": true,
"cmpStatus": "loaded",
"displayStatus": "hidden",
"apiVersion": "2",
"gvlVersion": 82}

How would I be able to assign this console output to a variable in python using Selenium?

I could try executing the script with the execute_script-method like this:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
cap = DesiredCapabilities.CHROME
cap["goog:loggingPrefs"] = {"performance": "ALL"} 
driver = webdriver.Chrome(desired_capabilities=cap)
driver.execute_script("window.__tcfapi('ping',2,function(data,success){console.log(data);})")

And then getting the console logs, like it is discussed in this question: Getting console.log output from Chrome with Selenium Python API bindings

However this seems very inefficient and I surmise my problem stems more from a misunderstanding of asynchronous javascript programming.

I also tried using the execute_async_script method. But that only results in a timeout error for me.

Any hints are greatly appreciated, thank you very much.

1 Answers1

0

I solved it (temporarily) using this workaround:

tcs = driver.execute_script("window.__tcfapi('ping',2,function(data,success){window.tcs=data;}); return window.tcs")

So basically I changed the callback function to set a global variable window.tcs which then can be accessed and returned within the function that is executed when we use Selenium's execute_script method.

I am aware that setting global variables is a bad idea...

However you can also remove it with driver.execute_script("delete window.tcs").