Error while fetching BLOB url using Selenium

Question

I tried getting the content of the blob stored in the memory using Selenium in Python with script injection.

Here is the code:

from selenium import webdriver
import base64
from bs4 import BeautifulSoup

def download_blob(driver, uri):
    result = driver.execute_async_script("""
        var uri = arguments[0];
        var callback = arguments[arguments.length-1];
        var toBase64 = function(buffer){for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)i[c]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder("ascii").decode(a)};
        var xhr = new XMLHttpRequest();
        xhr.responseType = 'arraybuffer';
        xhr.onload = function(){ callback(toBase64(xhr.response)) };
        xhr.onerror = function(){ callback(xhr.status) };
        xhr.open('GET', uri);
        xhr.send();
        """, uri)
    print(uri, result)

    if type(result) == int :
        raise Exception("Request failed with status %s" % result)

    return base64.b64decode(result)

options = webdriver.ChromeOptions()
options.add_argument('user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36')
driver = webdriver.Chrome(options=options)
url = 'https://www.youtube.com/watch?v=KBtk5FUeJbk'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html5lib')
blob_url = soup.find('video', attrs={'class': 'video-stream html5-main-video'})['src']
byte_stream = download_blob(driver, blob_url)

Output:

blob:https://www.youtube.com/5e3f1fab-3839-45a1-bb62-3582635b9e7d 0

Traceback (most recent call last):
  File "C:\Users\*****\Desktop\blob-download.py", line 32, in <module>
    byte_stream = download_blob(driver, blob_url)
  File "C:\Users\*****\Desktop\blob-download.py", line 20, in download_blob
    raise Exception("Request failed with status %s" % result)
Exception: Request failed with status 0

The result variable returns an integer 0, stating that the request has failed. I am not getting what is going wrong. At least some part of the blob which is in memory should be displayed as bytes.

I took the above code as a reference from How to download an image with Python 3/Selenium if the URL begins with "blob:"?. The answer mentioned that I needed to grab that blob url from the page that created that blob, hence, I am scraping the blob url using BeautifulSoup and not hard-coding the blob url. Example:

byte_stream = download_blob(driver, 'blob:https://www.youtube.com/5e3f1fab-3839-45a1-bb62-3582635b9e7d') # this would definitely not work

I even tried changing the websites, as I thought maybe YouTube would have some strict policy regarding scraping content, but still no luck. All the other websites gave the same response.

An insight on some JavaScript alternative is also welcome.

Hello, I am facing the same problem as yours. May I ask is there a solution for this now? — Hui Gordon, Jun 01 '22 at 11:04

Error while fetching BLOB url using Selenium

0 Answers0