0

I'm trying to download audio from youtube with youtube-dl.exe and ffmpeg.exe (Windows 7), but I am having some troubles with encoding. I have to parse metadata manually, because when I try to use

--metadata-from-title "%(artist) - %(title)" --extract-audio --audio-format mp3 https://www.youtube.com/watch?v=DaU94Ld3fuM

I get ERROR: Could not interpret title of video as "%(artist) - %(title)"

Anyway, I wrote some code to save metadata with ffmpeg:

def download(url, title_first=False):
    if (0 == subprocess.call('youtube-dl --extract-audio --audio-format mp3 %s' % url)):
        #saves file in current directory in format: VID_TITLE-VID_ID.mp3
        video_id = url[url.find('=')+1:] #video id from URL (after ?v=)
        for f in os.listdir('.'):
            if video_id in f:
                filename = f
                break
        os.rename(filename, video_id+'.mp3') #name without non-ascii chars (for tests)
        video_title = filename[: filename.find(video_id)-1]

        output = video_title + '.mp3'
        title, artist = '', ''
        try: #parsing the title
            x = video_title.find('-')
            artist = video_title[:x].strip()
            title = video_title[x+1:].strip()
            if (title_first): output = '%s - %s.mp3' % (title, artist)
        except:
            pass

        x = 'ffmpeg -i "%s" -metadata title="%s" -metadata artist="%s" -acodec copy -id3v2_version 3 -write_id3v1 1 "%s"' \
                        % (video_id+'.mp3', title, artist, output)
        print x
        subprocess.call(x)

The file is downloaded and then cropped to given start and duration times (the code above is a simplified version). Filename is fine, but when I open the file with AIMP3, it shows rubbish instead of non-ascii characters:

enter image description here

I've tried to re-encode the final command with iso-8859-2, utf-8 and mbcs:

x = x.decode('cp1250').encode('iso-8859-2')

But non-ascii chars are still not readable. Passing an unicode command returns UnicodeEncodeError...

Any idea how to solve this problem?

Community
  • 1
  • 1
mopsiok
  • 575
  • 1
  • 10
  • 19

2 Answers2

2

You are missing an s after each template field. Should be --metadata-from-title "%(artist)s - %(title)s". You should also pass --add-metadata to write the metadata to file. So that final command will look like this:

youtube-dl --metadata-from-title "%(artist)s - %(title)s" --extract-audio --audio-format mp3 --add-metadata https://www.youtube.com/watch?v=DaU94Ld3fuM

dstftw
  • 2,941
  • 2
  • 16
  • 10
  • Thanks alot, that works perfectly! As far as I remember I was trying with "%(artist)s" and "%(title)s", but I never used --add-metadata. All non-ascii chars are present in the final tags. – mopsiok May 21 '15 at 18:33
  • 2
    I think this is not working for me? After running your command: `[download] 100% of 6.69MiB in 00:10 [fromtitle] parsed artist: Aerosmith [fromtitle] parsed title: Walk This Way (lyrics) [HD] [ffmpeg] Adding metadata to 'Aerosmith - Walk This Way (lyrics) [HD]-xBg2LP223_8.m4a' [ffmpeg] Destination: Aerosmith - Walk This Way (lyrics) [HD]-xBg2LP223_8.mp3 Deleting original file Aerosmith - Walk This Way (lyrics) [HD]-xBg2LP223_8.m4a (pass -k to keep) [ffmpeg] Adding thumbnail to "Aerosmith - Walk This Way (lyrics) [HD]-xBg2LP223_8.mp3" ` but then the mp3 file has no metadata in it? – Redoman Aug 07 '15 at 05:11
0

From this SO article I think you are having this problem

import re
from unicodedata import normalize

_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.:]+')

def slugify(text, delim=u'-'):
    """Generates an slightly worse ASCII-only slug."""
    result = []
    for word in _punct_re.split(text.lower()):
        word = normalize('NFKD', word).encode('ascii', 'ignore')
        if word:
            result.append(word)
    return unicode(delim.join(result))

Usage:

>>> slugify(u'My International Text: åäö')
u'my-international-text-aao'
Community
  • 1
  • 1
nadermx
  • 2,596
  • 7
  • 31
  • 66
  • Thanks for your answer, but filename itself is not a problem. The problem was encoding metadata inside the file, but dstftw has helped already. – mopsiok May 21 '15 at 18:38