What is the best way to build a Document Management System in python using GCS or AWS S3?

Question

I'm building a tool where users in a particular account/tenant can upload images/videos (CREATE/DELETE) and also create/delete folders to organize those images. These images/videos can later be dragged and dropped onto a page. This page will be accessible to everyone in that account. So I have thought of 2 architecture flows but both seem to have trade-offs.

I thought I can generate signed url for each of the resource available in the document management system and for each resource that is used in the page. This method works if there are less number of images used in a page. What if the user has 30-40 images in a page, the client has to request signed URLs for each of those resource everytime a user loads the page. This increases latency while rendering the page on the client side.
Another architecture is to put all of the uploaded resource in a public bucket (explicitly stating the user that all uploaded resource will be public). The obvious tradeoff is security.

Is there a way where I can securely allow users to have numerous resources? Something like instead of generating a signedURL for the blob itself, would it be possible to generate a signedURL for a path? Example: instead of generating a signed url for /folder1/folder2/blob.png would I be able to generate a signedURL for /folder1/folder2 so that the client can request for all the blobs within the folder2 without multiple requests to the server?

What I want to achieve is minimal latency without compromising security.

With Google Cloud, there are no signed urls for path prefixes, just objects. — Doug Stevenson, Feb 08 '23 at 13:48
AWS signed URLs are likewise tied to a single object. Alternatively, AWS's CloudFront CDN product (typically configured to sit in front of S3 content) has [Signed Cookies](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-cookies.html), which can protect multiple files by path with wildcard patterns. — fedonev, Feb 08 '23 at 14:02
@fedonev, Thank you. I checked with signed cookies but boto3 currently doesn't support signed cookie generation out of the box. Is there a way to generate the cookie with python? — Vaibhav, Feb 13 '23 at 06:57
Glad to help. If existing questions like [Creating Signed Cookies for Amazon CloudFront](https://stackoverflow.com/questions/29383373/creating-signed-cookies-for-amazon-cloudfront) aren't what you're looking for, consider asking a new, narrowly-scoped question. — fedonev, Feb 13 '23 at 07:08

score 1 · Answer 1 · answered Feb 08 '23 at 14:03

For Google Cloud Storage you can use python and then use Flask as your web framework and then use this code to upload documents:

Your index should look like this

<!doctype html>
<html>
  <head>
    <title>File Upload</title>
  </head>
  <body>
    <h1>File Upload</h1>
    <form method="POST" action="" enctype="multipart/form-data">
      <p><input type="file" name="file"></p>
      <p><input type="submit" value="Submit"></p>
    </form>
  </body>
</html>

Then your python code would look like this for uploading documents:

from google.cloud import ndb, storage

from google.cloud.storage import Blob

from flask import Flask, render_template, request, redirect, url_for

app = Flask(__name__)

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/', methods=['POST'])
def upload_file():
    uploaded_file = request.files['file']
    if uploaded_file.filename != '':
        uploaded_file = request.files['file']       
        filename = “someUniiqueFileName”
        client = storage.Client(project=“yourProjectName”)
        bucket = client.get_bucket("yourProjectName.appspot.com")
        blob = Blob(filename, bucket)
        blob.upload_from_file(uploaded_file)
    return redirect(url_for('index'))

Then to download files you need later you'd add this code where you want to download it from.

bucket_name = "yourProjectName.appspot.com"   
source_blob_name = "someUniiqueFileName"   
destination_file_name = "local/path/to/file" 
storage_client = storage.Client() 
bucket = storage_client.bucket(bucket_name)
 blob = bucket.blob(source_blob_name) 
blob.download_to_filename(destination_file_name)

If you have any more questions about this just let me know.

Thanks for the answer but your solution doesn't support multiple objects with a single url — Vaibhav, Feb 13 '23 at 06:56

What is the best way to build a Document Management System in python using GCS or AWS S3?

1 Answers1