19

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?

import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
Johnny
  • 512
  • 2
  • 7
  • 18

8 Answers8

20

After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.

import urllib
import json

def get_all_video_in_channel(channel_id):
    api_key = YOUR API KEY

    base_video_url = 'https://www.youtube.com/watch?v='
    base_search_url = 'https://www.googleapis.com/youtube/v3/search?'

    first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)

    video_links = []
    url = first_url
    while True:
        inp = urllib.urlopen(url)
        resp = json.load(inp)

        for i in resp['items']:
            if i['id']['kind'] == "youtube#video":
                video_links.append(base_video_url + i['id']['videoId'])

        try:
            next_page_token = resp['nextPageToken']
            url = first_url + '&pageToken={}'.format(next_page_token)
        except:
            break
    return video_links
Stian
  • 776
  • 7
  • 10
  • this is a simple and accurate answer as I cannot find it in the Python API reference. – Kerem Sep 09 '18 at 23:06
  • @Stian it gives and error HTTPError: HTTP Error 403: Forbidden – Gautam Shahi Aug 09 '20 at 15:27
  • 2
    For Python 3: `import urllib.request`, change `inp = urllib.urlopen(url)` to `inp = urllib.request.urlopen(url,timeout=1)` – smcs Nov 18 '20 at 10:44
  • @smcs it's not working. urllib.error.HTTPError: HTTP Error 403: Forbidden –  Nov 29 '20 at 21:20
  • @rtt0012 What URL are you trying? – smcs Nov 30 '20 at 11:35
  • @smcs I copied your code and added my API Key, the rest I didn't changed. I wanted to look up this chanell: https://www.youtube.com/c/3blue1brown/videos I run the code by executing get_all_video_in_channel(UCYO_jab_esuFRV4b17AJtAw). The Chanell ID I have found: https://commentpicker.com/youtube-channel-id.php The Error message reads: urllib.error.HTTPError: HTTP Error 403: Forbidden –  Nov 30 '20 at 14:35
  • @rtt0012 It works for me with that site. Are you passing a string to the method, i.e. `get_all_video_in_channel("UCYO_jab_esuFRV4b17AJtAw")`? – smcs Nov 30 '20 at 14:48
  • @smcs I typed in my code corretly. By copying the text I forgot the quote sings. My API Key has no restrictions. Still get the same error message. I paste the error message as followings... –  Nov 30 '20 at 15:07
  • @smcs File "C:\Py38\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Py38\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Py38\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "C:\Py38\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "C:\Py38\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Py38\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) –  Nov 30 '20 at 15:12
  • @smcs The last line of error message reads: urllib.error.HTTPError: HTTP Error 403: Forbidden –  Nov 30 '20 at 15:13
  • 1
    @rtt0012 You should open a question on https://codereview.stackexchange.com/ – smcs Nov 30 '20 at 16:01
20

Short answer:

Here's a library That can help with that.

pip install scrapetube

import scrapetube

videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")

for video in videos:
    print(video['videoId'])

Long answer:

The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:

  1. Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
  2. Using youtube-dl. Like this:
import youtube_dl
    youtube_dl_options = {
        'skip_download': True,
        'ignoreerrors': True
    }
    with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
        videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')

This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).

So i made the library scrapetube which uses the web API to get all the videos.

dermasmid
  • 428
  • 5
  • 7
  • 2
    Very good solution, Also if someone what to get the video url in place of id you can use `print("https://www.youtube.com/watch?v="+str(video['videoId']))` in place of `print(video['videoId'])` . – Aekanshu Jun 27 '21 at 18:23
  • Great module! I think it does web scraping on youtube rather than using API, because, for example, there is no published date data in the responses. When I use the API, though, there seems to be some limitation issue. – sequence Nov 29 '22 at 18:21
  • ~/anaconda3/lib/python3.9/site-packages/scrapetube/scrapetube.py in get_channel(channel_id, channel_url, limit, sleep, sort_by) 48 api_endpoint = "https://www.youtube.com/youtubei/v1/browse" 49 videos = get_videos(url, api_endpoint, "videoRenderer", limit, sleep) ---> 50 for video in videos: 51 yield video 52 ............... ~/anaconda3/lib/python3.9/json/decoder.py in raw_decode(self, s, idx) --> 355 raise JSONDecodeError("Expecting value", s, err.value) from None JSONDecodeError: Expecting value: line 1 column 1 (char 0) – CS QGB Aug 26 '23 at 19:28
13

Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).

Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.

EDIT: Here's the code for how I would do it

import urllib, json
author = 'Youtube_Username'

foundAll = False
ind = 1
videos = []
while not foundAll:
    inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
    try:
        resp = json.load(inp)
        inp.close()
        returnedVideos = resp['feed']['entry']
        for video in returnedVideos:
            videos.append( video ) 

        ind += 50
        print len( videos )
        if ( len( returnedVideos ) < 50 ):
            foundAll = True
    except:
        #catch the case where the number of videos in the channel is a multiple of 50
        print "error"
        foundAll = True

for video in videos:
    print video['title'] # video title
    print video['link'][0]['href'] #url
max k.
  • 549
  • 4
  • 9
7

Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.

The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder

dSebastien
  • 1,983
  • 2
  • 21
  • 31
  • 1
    Thanks for this. Combined with [pafy](https://github.com/mps-youtube/pafy) you can fetch all videos on a channel. – Jabba Jul 08 '15 at 18:08
  • 2
    this did not work for PyCon 2015 channel or even the example mentioned on the git, it just says channel not found. Am I doing something wrong. – Arjun Bhandari Nov 05 '15 at 07:34
  • I got quite a lot of errors from using this. Admittedly my channel name appears to have a space in it which caused trouble on the cli, but the tool doesn't take the ID instead, but it searched back through 5 years and found no vidz, and I've got 410 on the channel. – volvox Sep 03 '19 at 21:49
  • FYI I don't have time to maintain that project, but if anyone is interested, don't hesitate to go and fix it, I'll happily merge any improvements ;-) – dSebastien Sep 26 '19 at 09:09
5

Independent way of doing things. No api, no rate limit.

import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video

This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.

This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.

sangeetha
  • 53
  • 1
  • 9
Gajendra D Ambi
  • 3,832
  • 26
  • 30
1

Using Selenium Chrome Driver:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time

driverPath = ChromeDriverManager().install()

driver = webdriver.Chrome(driverPath)

url = 'https://www.youtube.com/howitshouldhaveended/videos'

driver.get(url)

height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1

while previousHeight < height:
    previousHeight = height
    driver.execute_script(f'window.scrollTo(0,{height + 10000})')
    time.sleep(1)
    height = driver.execute_script("return document.documentElement.scrollHeight")

vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
    vid_urls.append(v.get_attribute('href'))

This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.

1

I modified the script originally posted by dermasmid to fit my needs. This is the result:

import scrapetube
import sys

path = '_list.txt'
sys.stdout = open(path, 'w')

videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")

for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])

Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.

Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".

For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Radovici
  • 21
  • 2
1

I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".

import scrapetube
import sys

path = '_list.txt'

print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")

# Prints the output in the console and into the '_list.txt' file.
class Logger:
 
    def __init__(self, filename):
        self.console = sys.stdout
        self.file = open(filename, 'w')
 
    def write(self, message):
        self.console.write(message)
        self.file.write(message)
 
    def flush(self):
        self.console.flush()
        self.file.flush()

sys.stdout = Logger(path)

# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")

videos = scrapetube.get_channel(channel_id)

for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])
Radovici
  • 21
  • 2