3

I have a list of file URLs which are download links. I have written Python code to download the files to my computer. Here's the problem, there are about 500 files in the list and Chrome becomes unresponsive after downloading about 50 of these files. My initial goal was to upload all the files that I have downloaded to a Bucket in s3. Is there a way to make the files go to S3 directly? Here is what I have written so far:

import requests
from itertools import chain
import webbrowser

url = "<my_url>"
username = "<my_username>"
password = "<my_password>"
headers = {"Content-Type":"application/xml","Accept":"*/*"}

response = requests.get(url, auth=(username, password), headers = headers)
if response.status_code != 200:
    print('Status:', response.status_code, 'Headers:', response.headers, 'Error Response:', response.json())
    exit()

data = response.json()
values = list(chain.from_iterable(data.values()))
links = [lis['download_link'] for lis in values]
for item in links:
    webbrowser.open(item)
alapalak
  • 147
  • 2
  • 2
  • 9
  • There is no way to send the files directly to S3. You might look into using something like wget, rather than your browser, to retrieve the files. You can pass in a list of files for it to download, so you won't even need the loop. – Angrysheep May 19 '17 at 00:58
  • @Angrysheep Newbie here. Where exactly will the files be downloaded when wget is used? Also, what if each link is password protected? – alapalak May 19 '17 at 17:34
  • http://stackoverflow.com/questions/23761579/using-wget-to-download-a-file-from-a-password-protected-link#answer-25314303 has a suggestion about the password. As for the download location, wget will use any path you specify. You'll want to check out the docs, before you try it. – Angrysheep May 19 '17 at 18:16

2 Answers2

8

Its quite simple using python3 and boto3 (AWS SDK), eg.:

import boto3

s3 = boto3.client('s3')
with open('filename.txt', 'rb') as data:
    s3.upload_fileobj(data, 'bucketname', 'filenameintos3.txt')

for more information you can read boto3 documentation here: http://boto3.readthedocs.io/en/latest/guide/s3-example-creating-buckets.html

Enjoy

Paulo Victor
  • 3,814
  • 2
  • 26
  • 29
1

If you have the aws cli installed on your system you can make use of subprocess library. For example:

import subprocess
def copy_file_to_s3(source: str, target: str, bucket: str):
   subprocess.run(["aws", "s3" , "cp", source, f"s3://{bucket}/{target}"])

Similarly you can use that logics for all sort of AWS client operations like downloading or listing files etc. This way there is no need to import Boto3. I guess its use is not intended that way but in practice I find it quite convenient that way. This way you also get the status of the upload displayed in your console - for example:

Completed 3.5 GiB/3.5 GiB (242.8 MiB/s) with 1 file(s) remaining

To modify the method to your wishes I recommend having a look into the subprocess reference as well as to the AWS Cli reference.

Jojo
  • 427
  • 3
  • 8