1

I want to:

Download audio files from Youtube

which I have done with pytube, however, it is formatted in mp4 even though I set only_audio to True.

then turn the audio files to numpy arrays

There are libraries that work on mp3, for example, pydub, but not mp4. When I tried moviepy, it failed because there is no video and therefore no framerate. I don't want to download the video because it will take much longer.

note that I want the audio, not the video.

How can:

download audio from youtube, and turn it into numpy arrays?

Thanks for any helps :)


EDIT

Thanks to the comments, I've managed to turn the mp4 into mp3 using ffmpeg

However, when I tried to turn it into numpy arrays using the code from this question, which looks like this:

def read(f, normalized=False):
    """MP3 to numpy array"""
    a = pydub.AudioSegment.from_mp3(f)
    y = np.array(a.get_array_of_samples())
    if a.channels == 2:
        y = y.reshape((-1, 2))
    if normalized:
        return a.frame_rate, np.float32(y) / 2**15
    else:
        return a.frame_rate, y

it raised this error:

    Traceback (most recent call last):
  File "C:\Users\myname\Google Drive\Python\Projects\Music\Downloads\Music Read.py", line 63, in <module>
    print(read(x,True))
  ......
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 1017, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

This is weird because as demonstrated below, the path should work perfectly

for f in os.listdir(path):
    if (f.endswith(".mp3")):
        print(f)
        x = 'C:/Users/myname/Google Drive/Python/Projects/Music/Downloads/{}'.format(f)
        print(os.path.exists(x))
        print(open(x))
        print(read(x,True))

outputs:

test-Copy.mp3
True
c:/users/myname/google drive/python/projects/music/downloads/test-copy.mp3
<_io.TextIOWrapper name='c:/users/myname/google drive/python/projects/music/downloads/test-copy.mp3' mode='r' encoding='cp1252'>

Also, when I input a file path that actually doesn't exist, it outputs a different error:

......
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\pydub\utils.py", line 57, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
FileNotFoundError: [Errno 2] No such file or directory: 'c:/users/myname/google drive/python/projects/music/downloads/hi'

How can use the code from this question to turn the mp3 into numpy arrays, if I can't, how else?

btw I'm running on Win10 with python 3.6

I really hope I have made myself clear enough, and again thanks in advance for any bits of advice :)

Kenivia
  • 384
  • 1
  • 13
  • 1
    Possible duplicate of [How to turn a video into numpy array?](https://stackoverflow.com/questions/42163058/how-to-turn-a-video-into-numpy-array) – Zaraki Kenpachi Jun 04 '19 at 10:03
  • That question is on the video, not the audio, as far as I can tell. – Kenivia Jun 04 '19 at 10:07
  • 1
    How about converting the file with `ffmpeg -i old.mp4 new.mp3` ? – Lukasz Tracewski Jun 04 '19 at 10:46
  • as @LukaszTracewski says ffmpeg is your friend in world of audio+video rendering and conversion ... ffmpeg is also available as a library not just as an executable ... its the industry work horse which is what many higher level tools use under the covers – Scott Stensland Jun 04 '19 at 13:44
  • @LukaszTracewski and Scott, Thank, I will try and give you update later – Kenivia Jun 04 '19 at 23:29

1 Answers1

0

This is weird answering my own question but:

I got around the pydub issue by using this code:

def decode (fname):
    # If you are on Windows use full path to ffmpeg.exe
    cmd = ["C:/Users/allen/Google Drive/Python/Tools/ffmpeg-20190604-d3f236b-win64-static/bin/ffmpeg.exe", "-i", fname, "-f", "wav", "-"]
    # If you are on W add argument creationflags=0x8000000 to prevent another console window jumping out
    p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    data = p.communicate()[0]
    return np.fromstring(data[data.find(data)+4:], np.int16)
Kenivia
  • 384
  • 1
  • 13