0

I have a DICOM viewer application that allows users to upload DICOM studies (500MB - 3GB in size). Each study could contain 200-2000 individual DICOM files. I allow users to directly upload these DICOM studies to a Google Cloud Storage bucket that is publicly writable. After a study is fully uploaded to the bucket, the frontend application sends a request to a cloud function to process the uploaded files.

There are 4 parts to processing the files:

  1. Validate that all uploaded files were DICOM files
  2. Move the validated DICOM files from this public writable bucket to a private bucket
  3. Run some analysis and machine learning algorithms on only a subset of the files
  4. Move all the valid uploaded files to Google healthcare API

The issue that I am having is that it is taking too long to run all these processing steps after the full study is uploaded. One solution I was thinking of was to invoke a cloud function per each individual DICOM file as it gets uploaded to the bucket and run #1 and #2 on it and then wait for the study to fully upload before running #3 and #4. The concern I have with this approach is that since my bucket is publicly writable, any malicious user could upload a very large amount of files and invoke many cloud functions which will result in unnecessary charges.

Another approach is to only allow authenticated users to upload files to a private GCS bucket, but that would require me to generate a signed URL per each DICOM file. So if there are 2000 DICOM files, I would need the front-end app to request to create 2000 signed URLs from the backend.

I am not sure how to approach this issue. Any advice in designing or implementing will be helpful

rahulg510
  • 53
  • 5
  • The first step is to disable public writes to the bucket. There is a real risk of hosting content that could get you into jeopardy. – John Hanley Oct 14 '22 at 22:12
  • @JohnHanley What kind of risk are you referring to? The public bucket only has write access, no delete or read. Are there still risks of that data being accessible? Or are you talking about how any entity is able to write very large amounts data? – rahulg510 Oct 14 '22 at 23:00
  • What permissions did you configure that only allow write without allowing read? Someone could upload pornography, copyrighted content, software distributions, etc. If the content violates Google TOS, you would have your account suspended. – John Hanley Oct 15 '22 at 00:17
  • @JohnHanley I gave permission 'Storage Object Creator' to allUsers. I have also implemented life cycle rule to delete all files uploaded to this bucket are deleted after 1 day. I can also implemented the files to be deleted if they are not DICOM files. Thanks for the heads up on user uploading explicit content. Would you instead suggest to create signed URLs for authenticated users for each DICOM file and keep the bucket private? – rahulg510 Oct 15 '22 at 00:36
  • Yes, of course. – John Hanley Oct 15 '22 at 01:37

0 Answers0