9

I'd like to download a video from a website with Python script, however, the video is served by a blob URL as below.

<video class="jw-video jw-reset" style="object-fit: fill;" jw-loaded="data" src="blob:https://xxxxxxx.com/f717096e-5e1a-42e1-8c3c-3ec777b5d478"></video>
halfer
  • 19,824
  • 17
  • 99
  • 186
victor_gu
  • 249
  • 1
  • 3
  • 9
  • have you tried : https://stackoverflow.com/questions/39517522/download-file-from-blob-url-with-python ? – Gerard Rozsavolgyi Dec 30 '17 at 14:13
  • Hi, thanks for your comment. actually my situation is a bit complex. I firstly used selenium to login to the website and obtained the cookies, then I passed the session cookies to python. def request(driver): s = requests.Session() cookies = driver.get_cookies() for cookie in cookies: s.cookies.set(cookie['name'], cookie['value']) return s req = request(browser) Could you please detail a bit how to save the video after the above codes? thanks! – victor_gu Dec 30 '17 at 14:29
  • 3
    put all information in question. You can also create minimal working example. – furas Dec 30 '17 at 15:34
  • This question is rather incomplete, and could do with improvement. It may close as "unclear" or "lacking [mcve]". – halfer Dec 30 '17 at 21:15

4 Answers4

4

Blob video can be download by using the below python code you have to get the master segment url from page inspect like in the image given , past the url in the code where mentioned it

enter image description here

import requests
import m3u8
import subprocess

master_url ='master_url_from_inspect_network' 
#past your page inspect request header

r = requests.get(master_url)
m3u8_master = m3u8.loads(r.text)
print(m3u8_master)
playlist_url =m3u8_master.data['playlists'][0]['uri']
play_r = requests.get(playlist_url)
m3u8_master_play = m3u8.loads(play_r.text)
m3_data=(m3u8_master_play.data)

m3_datas = m3_data['segments'][0]['uri']

with open('video.ts','wb') as fs:
    for segments in m3_data['segments']:
       uri = segments['uri']
       print(uri)
       r = requests.get(uri)
       fs.write(r.content)

subprocess.run(['ffmpeg','-i','video.ts','video.mp4'])

1

You can’t “download it”. A blob is a pseudo url that represents a buffer in memory. It does not point to any file in a server. https://developer.mozilla.org/en-US/docs/Web/API/Blob

szatmary
  • 29,969
  • 8
  • 44
  • 57
  • 2
    But is it really impossible to access the buffer's content ? – Gerard Rozsavolgyi Dec 30 '17 at 16:06
  • No, but that’s a different question. “Download” implies the ability to access the content from another process, and that is impossible. To save a blob is straight forward. Google can answer that faster than I can. But you will need to ingect the JS into the running page if you don’t own the server. – szatmary Dec 30 '17 at 16:10
  • @szatmary, I've been trying to solve this problem in a google extension, and I've googled to the best of my capability, but I guess my searching skills have taken a toll. I've not been able to find a solution to this. Could you point me in a direction? – Arihant Mar 13 '18 at 06:44
  • Sure. First implement a player using MSE so you understand how it works, then works backwards from there. – szatmary Mar 13 '18 at 06:46
  • 15
    "Google can answer that faster than I can" Google brought me here so... – ocket8888 Dec 30 '19 at 10:09
1

In the cases I came across, the pages in which I saw these blob:https://... URLS were also serving .m3u8 files. These had the real links to the video, in many separate pieces. And sometimes also an encryption key.

However, the links to these .m3u8 files are sometimes generated by javascript, and don't exist in the source of the original page. So you may need to use your browser's dev tools and look at the network tab while refreshing the page with the video, to be able to see the requests to these .m3u8 URLs.

In my case, youtube-dl (which is a Python script) was able to download the video when given that .m3u8 URL, and feed it to ffmpeg.

So you could try that and then see in the youtube-dl source how it does it with Python .

mivk
  • 13,452
  • 5
  • 76
  • 69
-14

You can use urllib2 for that.

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
siods333333
  • 29
  • 10