0

Suppose i have this file in s3 bucket:

s3://qq-dag/production/special-db/xx.gz

I want now that somebody would be able to send me this path and i would automatically use this file path in order to send the file itself in a post request i make.

The solution i thought about is first downloading it to local and then somehow using my local file path to send in post request. Could it work? Anyone can post an example of how i do it?

i tried something like this but i keep getting errors for every solution i try any variation of this code:

file = r'C:\Users\jhon.king\files\xx.gz'

bucket = 'qq-dag/production/special-db'
key='xx.gz'
     
s3client = boto3.client('s3')
s3client.download_file(bucket, key, file)









//now post request part
files=[
  ('file',('xx.gz',ocpen(''r'C:\Users\jhon.king\files\xx.gz,'rb'),'text/gz'))
]
headers = {
'X-Atlassian-Token': 'nocheck',
 'Authorization': 'Basic Ymyb3dpdHo6kkczEyMzQ=',

}
response = requests.request("POST", url, headers=headers, data=payload, files=files)

I am sure i somehow misplay with the bucket or file name parts . Any one can help me?

i tried this for download based on an answer bellow i keep on getting error

​import os
prefix = 'user/mynew-research'
file = "mymyjira.gz"
#s3.Bucket(bucket).download_file(f'{prefix}/{file}', file)
​
AWS_KEY_ID = "kUvRWN8xabRdg++lUP84A3g"
AWS_ACCESS_KEY = 'AKIAJPQ'
bucket = 'my-research'
​
s3=boto3.resource(
    service_name='s3',
    region_name='us-east-1',
    aws_access_key_id=AWS_KEY_ID,
    aws_secret_access_key=AWS_ACCESS_KEY
)
s3.Bucket(bucket).download_file(f'{prefix}/{file}', file)

ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request

Boyuis
  • 1
  • 4

1 Answers1

0

The download file users the bucket/path/file altogether.

>>> bucket = "s3_bucket"
>>> prefix = "path/to"
>>> file = "file.gz"
>>> s3.Bucket(bucket).download_file(f'{prefix}/{file}', file)

I see you are downloading the file s3client.download_file(bucket, key, file) but you are posting trying to upload from the s3 path with requests, but you are not going to be able to do that.

requests is only sending a url (in your example) not a file content, you are mixing requests with S3 locations but you need to use it separately. boto3 is going to grant you permissions to download you files or do something with S3, but requests can only send the data that you have already in memory or files or data, etc.

So I would download the file (or get the file from S3) and then open the file to be sent.

# Your download file code goes here
# And now the file "xx.gz" exists in the file system

response = requests.post(
    url=url,
    headers=headers,
    files={'file_or_name_needed': open('xx.gz', 'rb')},
    data=data,
)

Here's an answer for uploading files.

When instantiating the s3 client

AWS_KEY_ID = os.environ['AWS_KEY_ID']
AWS_ACCESS_KEY = os.environ['AWS_ACCESS_KEY']

boto3.resource(
    service_name='s3',
    region_name='xx-yyyy-i',
    aws_access_key_id=AWS_KEY_ID,
    aws_secret_access_key=AWS_ACCESS_KEY,
)

Edit:

You don't need to add the file into the prefix because the fstring will do the trick.

prefix = 'my-research/user/mynew-research'
file = "mymyjira.gz"

And the fstring results in my-research/user/mynew-research/mymyjira.gz

I've tested this myself and it's working fine:

import boto3
import os
import requests

config = dict(
    service_name="s3",
    region_name="eu-west-1",
    aws_access_key_id=os.environ["AWS_KEY"],
    aws_secret_access_key=os.environ["AWS_SECRET"],
)
s3_ = boto3.resource(**config)

bucket = "bucket"
prefix = "path/to"
file = "file"

s3_.Bucket(bucket).download_file(f"{prefix}/{file}", file)

response = requests.post(
    url="https://httpbin.org/post",
    files={
        "myfile": (open(file, "rb")),
    },
)

# or if you don't want to download the file and just read it
s3_response = s3_client.get_object(Bucket=bucket, Key=f"{prefix}/{file}")

response = requests.post(
    url="https://httpbin.org/post",
    files={
        "myfile": (s3_response["Body"]).read(),
    },
)

Pay special attention to bucket and path and file.
Let's say the bucket is called test-bucket and the file is in test_data/regression/evidence.gz
The code should be s3.Bucket("test-bucket").download_file("test_data/regression/evidence.gz", "evidence.gz")

user15757337
  • 23
  • 1
  • 7
  • But even the download itself doesn't download the file – Boyuis Nov 16 '21 at 23:57
  • I had a mistake there thx. I edited question – Boyuis Nov 16 '21 at 23:59
  • But is the script raising an error? The download file from boto3 will download the file in the same folder that you are running the script. To download a file `s3.Bucket(bucket).download_file(f'{prefix}/{file}', file)` – user15757337 Nov 17 '21 at 08:50
  • If you want to download the file in another directory change the 2nd argument. – user15757337 Nov 17 '21 at 08:57
  • i get this error when i try downloading as you say ClientError: An error occurred (404) when calling the HeadObject operation: Not Found – Boyuis Nov 17 '21 at 09:29
  • Well you should be adding `service_name region_name, aws_access_key_id, aws_secret_access_key` the when you create the client – user15757337 Nov 17 '21 at 12:18
  • @Boyuis you should change the `s3.client` for `s3.resource` also. – user15757337 Nov 17 '21 at 12:28
  • thx, but i tried your new revised code still getting ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request. i posted your code at the end of my question i edited it. this "os.environ" didnt work at all and gave a key error so i changed it the way you see in my question but still get error – Boyuis Nov 17 '21 at 19:07
  • I hope your keys are not real, if it's a real key please change it. – user15757337 Nov 17 '21 at 20:43
  • @Boyuis you're prefix is wrong, you need to delete the file from it `prefix = 'my-research/user/mynew-research/'` and you because the file part is going to resolve in `my-research/user/mynew-research/mymyjira.gz/mymyjira.gz'` – user15757337 Nov 17 '21 at 21:49
  • thx. i certenly did try both ways. Also now i tried the same you said now i edit my code in question as you say. i still get error ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request – Boyuis Nov 18 '21 at 11:38
  • the keys are fake of course – Boyuis Nov 18 '21 at 11:40
  • ha ok now i deleted the name of bucket from prefix then i dont get error but still i dont see the file in my local. The code runs but no file downloaded to my local – Boyuis Nov 18 '21 at 12:19
  • Strange the file should be downloaded in the same dir as the executed script, from where you run `python3 yourfile.py` Can you try debugging the code with to check the response? – user15757337 Nov 18 '21 at 13:23
  • what if you try to debug and get `s3_client.get_object(Bucket=bucket, Key=f"{prefix}/{file}")` the object instead with `s3_response["Body"]).read()`? – user15757337 Nov 18 '21 at 13:24
  • i run it from databricks maybe thats why it doesnt go to local even though i give complete path. Strange i dont even find it in the dbfs of databricks. but strange no error occure – Boyuis Nov 18 '21 at 15:40
  • when you are using running the script, are you listing the files afterwards? is the script in the same directory? If not, check both directories. – user15757337 Nov 18 '21 at 18:10
  • i didnt understand you sorry. I just specified 'file name' . I do a search in my whole computer i cant see it. Nor could i find it in the inner data dbfs of databricks. strange – Boyuis Nov 18 '21 at 19:23
  • for the file why not changing this `r'C:\Users\jhon.king\files\xx.gz'` to only the file name `xx.gz`? Are you running the script in a web console are you ssh? – user15757337 Nov 18 '21 at 20:31
  • I did only file name look at my last code in question. This is the only one that runs. But file isnt in my local. I run python code on databricks web – Boyuis Nov 19 '21 at 21:15
  • Watch ny last code in question. Files is indeed only mymyjira.gz but I dont see it downloaded – Boyuis Nov 20 '21 at 08:55
  • How is it running yout python code on databricks web? is it a web console? If so, I would suspect that the console is using a tmp directory. Is it windows based? Why not trying to add `subprocess.run(['ls' ,'-lt'])` or `dir` to print the directory that you are executing the script in? – user15757337 Nov 21 '21 at 09:03
  • I just added the script to `/Users/userA/Scripts/run.py` and run that from the exact same location and the file was downloaded in the same directory that the execution of the script was done. Meaning if you run the script let's say under `/Users/userB` then there will be no file under `/Users/userA/Scripts` because `python3 app.py` was executed under `/Users/userB` – user15757337 Nov 21 '21 at 09:07
  • It runs on a cloud not on my computer or anything. I suspect it's not safe if they allowed access to local – Boyuis Nov 21 '21 at 14:47
  • But it doesn't matter if it's cloud or local, there's a folder structure anywhere. otherwise you need to use the `.read()` method instead of downloading it, you just send the content. `s3_response = s3_client.get_object(Bucket=bucket, Key=f"{prefix}/{file}")` – user15757337 Nov 21 '21 at 14:49
  • Realy? And then how do I post it in request? – Boyuis Nov 22 '21 at 01:52
  • As for local vs cloud. What can I do I dont see it anywhere. Not in local and not in data storage of databricks – Boyuis Nov 22 '21 at 01:52
  • To send the data to a request when you use `s3_response = s3_client.get_object(Bucket=bucket, Key=f"key/to/file.tgz"` r will contain the response and then you send the files with `s3_response["Body"]).read()` check the answer the last part shows this example. – user15757337 Nov 22 '21 at 20:58
  • As for local vs cloud, the file should be downloaded in the same directory that you are running the script, are you using an IDE? If you use something like Pycharm check the `Workingdir` config. – user15757337 Nov 22 '21 at 20:59