1

I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.

I have looked at several questions here, but most notably; How do I download a file over HTTP using Python?

It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.

Is there a way to do this and with a recommended library?

For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.

Streamripper received a try changing user agents error.

I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?

This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.

import requests

url = "http://someurl.com/webcast/something"
r = requests.get(url)

with open('file.mp3', 'wb') as f:
    f.write(r.content)

# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)

**Edit

import requests
import ffmpy
import datetime
import os

## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED

def BordersPythonDownloader(url):
    print('Beginning file download requests')
    r = requests.get(url, stream=True)
    contype = r.headers['content-type']
    if contype == "audio/mpeg":        
    print("audio file")
        filename = '[{}].mp3'.format(str(datetime.datetime.now()))
        with open('file.mp3', 'wb+') as f:
            f.write(r.content)
        ff = ffmpy.FFmpeg(
            inputs={'file.mp3': None},
            outputs={filename: None}
        )
        ff.run()
        if os.path.exists('file.mp3'):
            os.remove('file.mp3')
    elif contype == "application/pdf":
        print("pdf file")
        filename = '[{}].pdf'.format(str(datetime.datetime.now()))
        with open(filename, 'wb+') as f:
            f.write(r.content)
    else:
        print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))


# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL

#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'

#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)

#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)

Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?

  • Please make sure to not ask multiple questions, it's quite difficult to answer a post when you don't know what to answer. – Lloyd Feb 25 '19 at 14:36
  • My question is, Is there a way to do this and with a recommended library? I sincerely apologize for being confused with the process in general for accomplishing my task(written at the top) and asking a question about the process AND the code, I guess learning isn't being advocated? –  Feb 25 '19 at 14:46
  • The problem may not be your code, but your URL. My hunch that your URL is actually a web page and not an mp3 file. Using your code with a different URL pointing to an actual mp3 file (https://archive.org/download/testmp3testfile/mpthreetest.mp3), I am able to download the mp3 file. I verified it downloaded correctly by playing its audio. – Richard II Feb 25 '19 at 14:54
  • So do I check the type of file first, then save it as that type, example avi,mp4 etc. then pass it into ffmpy to convert it to MP3? I've checked for the appropriate URL using F12, but you're right, I think this is saving it as an MP3 regardless of original format, which shouldn't be the case if i'm understanding correctly? So I should check it first? –  Feb 25 '19 at 14:59
  • the r.headers['content-type'] tells you the type of file it is, in the case of your sample URL it is html. You need the URLs to the .mp3 or .pdf file, not the link to the .html page. – Richard II Feb 25 '19 at 15:08
  • Richard, thank you, I got alot out of that, edited with my working code above. –  Feb 25 '19 at 16:37

0 Answers0