3

I am able to create signed URLs and just need to know what to do with them after they are created.

There are several examples using Javascript to upload via a signed URL, but I cannot find any in Python. I am trying to use signed URLs as a workaround for the 32 MB limit imposed by Google App Engine for my Flask application.

Here is my python app.py script (not full functionality of my app here, just trying to upload to my bucket successfully):

from flask import Flask, request, render_template
from google.cloud import storage
import pandas as pd
import os
import gcsfs

bucket_name = "my-bucket"

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/path/to/file.json'

app = Flask(__name__)

def upload_blob(bucket_name, source_file_name, destination_blob_name):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)
    blob.upload_from_file(source_file_name)

    print("success")

@app.route('/')
def homepage():
    return render_template('home.html')

@app.route('/', methods = ['GET', 'POST'])
def upload_file():
    if request.method == 'POST':
        file1 = request.files['file1'] 
        file2 = request.files['file2']
        upload_blob(bucket_name, file1, 'file-1')
        upload_blob(bucket_name, file2, 'file-2')
        df = pd.read_csv('gs://' + bucket_name + '/' + 'file-1')
        print(df.shape)
        return "done"


if __name__ == "__main__":
  app.run(debug=True)

Here is the function I am using to create the signed URL:

def generate_upload_signed_url_v4(bucket_name, blob_name):

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)

    url = blob.generate_signed_url(
        version="v4",
        # This URL is valid for 15 minutes
        expiration=datetime.timedelta(minutes=15),
        # Allow GET requests using this URL.
        method="PUT",
        content_type="application/octet-stream",
    )
    print(url)
    return url

generate_upload_signed_url_v4(bucket_name, 'file.csv')

And below is my home.html:

<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <title>test upload</title>
</head>
<body>
    <h3> test upload </h3>

    <form method="POST" action="/" enctype="multipart/form-data">
        <p>Upload file1 below</p>
        <input type="file" name="file1"> 
        <br>
        <br>
        <p>Upload file2 below</p>
        <input type="file" name="file2">
        <br>
        <br>
        <input type="submit" value="upload">
    </form>


</body>
</html>

Based on what I researched here is my CORS configuration for the bucket I am trying to upload to:


[
{"maxAgeSeconds": 3600, 
"method": ["GET", "PUT", "POST"], 
"origin": ["https://my-app.uc.r.appspot.com", "http://local.machine.XXXX/"], 
"responseHeader": ["Content-Type"]}
]

Does the signed URL that is generated go in the html form? Does it need to go into my upload_file function?

Finally, when I paste the signed URL into my browser it shows this error:


<Error>
<Code>MalformedSecurityHeader</Code>
<Message>Your request has a malformed header.</Message>
<ParameterName>content-type</ParameterName>
<Details>Header was included in signedheaders, but not in the request.</Details>
</Error>

This is my first SO question so I apologize if it is poorly constructed. I am super lost and new to GCP. I have searched SO for a while now, and not found a use-case with Python/Flask where I can see how the signed URL is incorporated into the file upload process.

Again, I am building a webapp on Google App Engine flex, and need signed URLs to workaround the 32 MB file upload restriction.

UPDATE

I got the signed URL component figured out after realizing I needed to simply make a request to the signed URL.

Below is my new script that is loaded in App Engine (imports and "if name = main..." removed for snippet below).


os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/path/to/file.json'

EXPIRATION = datetime.timedelta(minutes=15)
FILE_TYPE = 'text/csv'
BUCKET = 'my-bucket'

def upload_via_signed(bucket_name, blob_name, filename, expiration, file_type):
    bucket = storage.Client().get_bucket(bucket_name)

    blob = bucket.blob(blob_name)

    signed_url = blob.generate_signed_url(method='PUT', expiration=expiration, content_type=file_type)

    requests.put(signed_url, open(filename.filename, 'rb'), headers={'Content-Type': file_type})

app = Flask(__name__)

app.config['UPLOAD_FOLDER'] = '/tmp'

@app.route('/')
def homepage():
    return render_template('home.html')

@app.route('/', methods = ['GET', 'POST'])
def upload_file():
    if request.method == 'POST':

        diag = request.files['file']
        filename_1 = secure_filename(diag.filename)
        filepath_1 = os.path.join(app.config['UPLOAD_FOLDER'], filename_1)
        diag.save(filepath_1)

        person = request.files['person']
        filename_2 = secure_filename(person.filename)
        filepath_2 = os.path.join(app.config['UPLOAD_FOLDER'], filename_2)
        person.save(filepath_2)

        upload_via_signed(BUCKET, 'diag.csv', diag, EXPIRATION, FILE_TYPE)

        upload_via_signed(BUCKET, 'person.csv', person, EXPIRATION, FILE_TYPE)

        df_diag = pd.read_csv('gs://' + BUCKET + '/' + 'diag.csv')
        print(df_diag.shape)
        return "done"

The code above is still throwing the 413 entity too large error. I think it's because I've got the 'POST' going through App Engine even though I am creating signed URLs. How do I need to re-arrange/what am I doing wrong? How does the code need to be structured to have the user upload directly to Google Cloud Storage via the signed URLs and avoid triggering the 413 entity too large error?

gndumbri
  • 41
  • 4
  • Can you try 2 things: 1. Use a path as blob when you generate your signed url `generate_upload_signed_url_v4(bucket_name, '/uploaded/')`. 2nd Can you change the method signed_url method from PUT to POST? And let me know if one, or both solve your issue. – guillaume blaquiere Jan 27 '21 at 21:27
  • I did get this to work after finding out that I just had to make a request to the signed URL that I created. However, it only worked on my local machine and still got the 413 error on App Engine because the file I tested with was larger than 32 MB. Based on what I've read, signed URLs are supposed to upload directly to Google Cloud Storage and then after the upload they trigger the app logic through App Engine. However, I am not finding how this is actually done. When I try, I am still making a POST request through App Engine thus triggering the 413 error. What am I doing wrong? – gndumbri Jan 28 '21 at 12:15
  • You don't have to pass through App Engine. Your frontend (the user browser) asks for a signed URL, your backend generate it and send it to the frontend. And then the browser use the signed URL to upload files to Cloud Storage. You can also imagine, at the end of the upload, a call from the frontend to the backend to provide the uploaded files (name and location) if you need to keep track of files in your backends. – guillaume blaquiere Jan 28 '21 at 13:03
  • That makes sense. How would I implement having the browser ask for the signed URL to upload a file from a user who accesses my app? Would it be triggered by the user selecting a file to upload and clicking "submit" through the app? Does the code to make that happen need to be in "home.html" or is it in "app.py"? I updated my question above with the new code I have used to deploy the app through App Engine so you can see where I am off/what I am doing wrong. Thank you so much for your help with this, I think I am close to figuring it out. – gndumbri Jan 28 '21 at 16:09
  • No, your code require to send the file to App Engine. the signed url is useless here. The browser need to directly send the file to Storage, without App Engine intermediary. I will try to prototype something to show you, but I'm very bad in frontend/javascript! – guillaume blaquiere Jan 28 '21 at 19:24
  • Can you try [this](https://docs.min.io/docs/upload-files-from-browser-using-pre-signed-urls.html)? – guillaume blaquiere Jan 29 '21 at 20:50
  • @guillaumeblaquiere , I am working on similar django application, I am able to generate signed url, sending it to frontend, and through postman I am using that url to upload a csv file in google cloud storage, but the file which got uploaded in GS is having extra information of headers which I gave from postman, can you guide me how to send file to upload using signed url without adding those extra information – ratnesh Jul 22 '21 at 04:07
  • is these extra information in the file? The signed url upload doesn't change the file content. i didn't catch your case. – guillaume blaquiere Jul 22 '21 at 11:10

1 Answers1

0

Once you have generated the signed url on the server, you just need to send it back to the client and use it to upload your files. you can for example send file data using normal fetch put request or as I prefer always using axios:

await axios.put(url, file);

the url here is the signed url. you may want to send your files as formData

Methkal Khalawi
  • 2,368
  • 1
  • 8
  • 13
  • Hi, thank you for your help, but I am not familiar with async functions. I did some research after getting a "not awaited" error. I then was unable to install axios, and am still troubleshooting that. In the mean time I was able to utilize the url with a simple requests.put similar to the axios.put method, however I am still getting the 413 error through Google App Engine. – gndumbri Feb 02 '21 at 03:35