0

I'm writing a script to download mp3 songs from web. first i'll be scraping in youtube. if found, download it using youtube-dl and convert it to mp3. If not found(done by using os.path.isfile), scrap in beemp3(for this sample) or mp3skulls etc. The script for only ytdownload and file check is below:

from bs4 import BeautifulSoup
from urllib.request import urlopen,Request,urlretrieve
import re
import youtube_dl
import sys
import os

def ytscrape(searchurl,baseurl):
    """normal scraping"""
    req = Request(searchurl, headers={'User-Agent':'Mozilla/5.0'})
    lst[:] = []
    url = urlopen(req)
    soup = BeautifulSoup(url, 'lxml')
    for i in soup.find_all('div',{'class':['yt-lockup-content','yt-lockup-meta-info']},limit=10):
        for link,views in zip(i.select('h3 > a'),i.select('ul > li')):
            if views is not None and views.next_sibling is not None:
                lst.append([baseurl+link.get('href'),views.next_sibling.text])
    for i in lst:
        i[1] = int(re.sub(r' views|,','',i[1]))
    lst.sort(key = lambda x:x[1])
    url.close()
    return lst[-1][0]

def dl_frm_youtube(yt_lnk,dlpath):
    """passes the youtube url of the song. it extracts audio alone and saves it
    in local.
    yt_lnk : youtube url for song which is priortised based on channel/views.
    """
    ydl_opts = {'format':'bestaudio/best','outtmpl':dlpath+'\\%(title)s.%(ext)s','postprocessors':[{'key':'FFmpegExtractAudio','preferredcodec':'mp3','preferredquality':'192',}]}
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        ydl.download([yt_lnk])
        if os.path.isfile(dlpath+'\\%(title)s.%(ext)s'):
            print('found')
        else:
            print('not found')

def main():
    song = 'numb' 
    artist = 'linkin park'
    baseurl = 'https://www.youtube.com'
    if sys.platform == 'win32':
        dlpath = os.path.join(os.environ['USERPROFILE'],'Music','spd')
        if not os.path.exists(dlpath):
            os.mkdir(dlpath)
    else:
        dlpath = '~/Music/' + song + '.mp3'
    searchurl = baseurl + '/results?search_query=' + '+' + artist.replace(chr(32),'+') + '+' + song.replace(chr(32),'+')
    dl_frm_youtube(ytscrape(searchurl,baseurl),dlpath)


lst = []
main()

When I tried to do file check, it failed eventhough the song downloaded and is present in the path. since it failed, it went to next function and downloaded that as well causing me to have 2 songs in my path.

So, my question is how to setup that file check so that it should print found when its present in the dlpath.

TIA

EDIT: As per phihag comments, I removed all useless info, changed code to have only problem part and hardcoded the inputs.

Nj3
  • 119
  • 8
  • Please make sure that your example is correct (see the [stackoverflow FAQ](http://stackoverflow.com/help/mcve)). It should be minimal (either use beescrape or youtube-dl, and simplify) and reproducible (make sure to hardcode all input). In addition, the function `ytscrape` seems to be a reimplementation of youtube-dl's `ytsearch` extractor. If you just want to know when postprocessing is finished, there is no need to install a hook, just continue after the last line. As a rule of thumb, a good stackoverflow question should be less than 20 lines. – phihag Mar 31 '17 at 07:27
  • Thanks. I edited as per your comments. As for `ytscrape`, I'm practicing web scraping and this is purely for my learning purpose alone, hence reinventing the wheel part. Removed hook and did a check after the last line. it's still not working as expected. – Nj3 Mar 31 '17 at 13:52

1 Answers1

2

Finally, I was able to find out why. It seems %(title)s.%(ext)s only seems to hold values inside ydl_opts dict. this answer helped me.

I changed my code inside dl_frm_youtube to this:

with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        #ydl.download([yt_lnk])
        info = ydl.extract_info(yt_lnk, download=True)
        songname = info.get('title', None)
        #print(songname)
        if os.path.isfile(dlpath+'\\'+songname+'.mp3'):
            print('found')

and it works perfectly. Answering it in case if someone find it useful.

Community
  • 1
  • 1
Nj3
  • 119
  • 8