0

So, I'm creating a youtube playlist calculator whicxh takes the playlist URl as input from the user, this web-app is wtritten in Flask framework. Code:

from flask import Flask, render_template, request
import re
import requests
from bs4 import BeautifulSoup

app = Flask(__name__)

def calculate_playlist_length(playlist_url):
  html = requests.get(playlist_url).text
  soup = BeautifulSoup(html, 'html.parser')
  total_seconds = 0

  for span in soup.select('span.ytd-thumbnail-overlay-time-status-renderer'):
      match = re.search(r'(\d+):(\d+)', span.text.strip())
      if match:
         minutes, seconds = match.groups()
         total_seconds += int(minutes) * 60 + int(seconds)

   hours = total_seconds // 3600
   minutes = (total_seconds % 3600) // 60
   seconds = total_seconds % 60

   return hours, minutes, seconds

@app.route('/', methods=['GET', 'POST'])
def index():
   if request.method == 'POST':
      playlist_url = request.form['playlist_url']
      hours, minutes, seconds = calculate_playlist_length(playlist_url)
      return render_template('result.html', hours=hours, minutes=minutes, 
        seconds=seconds)

    return render_template('index.html')

if __name__ == '__main__':
     app.run(debug=True)

But on running the app, which compiles succesfully and input a valid and public playlist. I get this output:

The total length of the playlist is 0:00:00.

How do I fix this? I have tried multiple URLs but the outuput is same. Also furthur down the line, I want to host this project on GCP and use YouTube APIs. So please guide me for that too.

DevOpsnoob
  • 47
  • 1
  • 4
  • Does this answer your question? [How do I get video durations with YouTube API version 3?](https://stackoverflow.com/questions/15596753/how-do-i-get-video-durations-with-youtube-api-version-3) – baduker Apr 03 '23 at 19:08

1 Answers1

0

YouTube playlist pages are dynamic, and if you preview the source HTML (which is all you get with requests.get(playlist_url).text) in the network logs (it should be the first request if you refresh the page)(view example), then you'll see that pretty much everything is rendered with JavaScript.

Because of this, you can expect soup.select('span.ytd-thumbnail-overlay-time-status-renderer') to return an empty ResultSet.(view example)

The good news is that the data you want is there in the source HTML, inside a script tag, from which various information can be extracted(view example with this function); so, if you alter calculate_playlist_length to something like

def calculate_playlist_length(playlist_url):
    pSoup = BeautifulSoup((r:=requests.get(playlist_url)).content, 'html.parser') 
    rStatus = f'<Response [{r.status_code} {r.reason}]> from {r.url}'

    jScript = pSoup.select_one('script:-soup-contains("var ytInitialData")')
    try: 
        jData = json.loads(jScript.string.split('=',1)[-1].strip().rstrip(';'))
        
        keysList = [
            'contents', 'twoColumnBrowseResultsRenderer', 'tabs', 0, 'tabRenderer',
            'content', 'sectionListRenderer', 'contents', 0, 'itemSectionRenderer',
            'contents', 0, 'playlistVideoListRenderer', 'contents'
        ]
        for k in keysList: jData = jData[k]
        vidLength = lambda v: int(v['playlistVideoRenderer'].get('lengthSeconds',0))

        total_seconds = sum([vidLength(vid) for vid in jData])
    except Exception as e: total_seconds, _ = 0, print(f'{rStatus}\n{e!r}') 
    return (total_seconds//3600), ((total_seconds%3600)//60), (total_seconds%60)

then calculate_playlist_length(url) should return (0, 35, 46) if url has been set to link this playlist.

Driftr95
  • 4,572
  • 2
  • 9
  • 21