11

I am creating a Google App Engine web app to "transform" files of 10K~50M

Scenario:

  1. User opens http://fixdeck.appspot.com in web browser
  2. User clicks on "Browse", select file, submits
  3. Servlet loads file as an InputStream
  4. Servlet transforms file
  5. Servlet saves file as an OutputStream
  6. The user's browser receives the transformed file and asks where to save it, directly as a response to the request in step 2

(For now I did not implement step 4, the servlet sends the file back without transforming it.)

Problem: It works for 15MB files but not for a 40MB file, saying: "Error: Request Entity Too Large. Your client issued a request that was too large."

Is there any workaround against this?

Source code: https://github.com/nicolas-raoul/transdeck
Rationale: http://code.google.com/p/ankidroid/issues/detail?id=697

Kai
  • 38,985
  • 14
  • 88
  • 103
Nicolas Raoul
  • 58,567
  • 58
  • 222
  • 373

4 Answers4

13

GAE has a hard limits of 32MB for HTTP requests and HTTP responses. That will limit the size of uploads/downloads directly to/from a GAE app.

Revised Answer (Using Blobstore API.)

Google provides to the Blobstore API for handling larger files in GAE (up to 2GB). The overview documentation provides complete sample code. Your web form will upload the file to blobstore. The blobstore API then rewrites the POST back to your servlet where you can do your transformation and save the transformed data back in to the blobstore (as a new blob).

Original Answer (Didn't Consider Blobstore as an option.)

For downloading, I think GAE only workaround would be to break the file up in to multiple parts on the server, and then reassemble after downloading. That's probably not doable using a straight browser implementation though.

(As an alternative design, perhaps you could send the transformed file from GAE to an external download location (such as S3) where it could be downloaded by the browser without the GAE limit restrictions. I don't believe GAE initiated connections have same request/response size limitations, but I'm not positive. Regardless, you would still be restricted by the 30 second maximum request time. To get around that, you'd have to look in to GAE Backend instances and come up with some sort of asynchronous download strategy.)

For uploading larger files, I've read about the possibility of using HTML5 File APIs to slice the file in to multiple chunks for uploading, and then reconstructing on the server. Example: http://www.html5rocks.com/en/tutorials/file/dndfiles/#toc-slicing-files . However, I don't how practical a solution that really is due to changing specifications and browser capabilities.

kaliatech
  • 17,579
  • 5
  • 72
  • 84
  • This completely ignores the blobstore, which is exactly suited to this situation. – Nick Johnson Aug 02 '11 at 03:04
  • I agree that no mention of blobstore in my answer was a major oversight. I initially didn't think it was an option, but I've realized since that it is (especially with the experimental capability to programmatically create a blob). I'm considering rewriting my answer even though it's already been accepted. – kaliatech Aug 02 '11 at 11:15
  • Since it was already accepted, I edited my answer to include the blobstore API. Drew Sears answer deserves as much the credit though. – kaliatech Aug 02 '11 at 11:34
  • Now that it is 2019, I wonder what the answer is for Python3? Blobstore doesn't seem to be an option. – Mark May 29 '19 at 04:29
  • Per the Blogstore API docs, the answer is now Google Cloud Storage. https://cloud.google.com/storage/docs/ – kaliatech May 29 '19 at 13:33
  • But how do we process videos for illegal content before we are libable for storing them? – Oliver Dixon Apr 09 '21 at 10:39
9

You can use the blobstore to upload files as large as 2 gigabytes.

Drew Sears
  • 12,812
  • 1
  • 32
  • 41
  • 1
    although to read the file you would need multiple API calls, as it will read a maximum of 32MB per blobstore API call. – Tom van Enckevort Aug 01 '11 at 14:49
  • Actually, the limit is 1MB per API call, but this shouldn't be relevant - it exposes a file-like interface, and I can think of very few things that would require really large reads. – Nick Johnson Aug 02 '11 at 03:03
  • 1
    @Nick Johnson - The per API call limit is 32MB for blobstore. (It's 1MB for datastore). See: http://code.google.com/appengine/docs/java/blobstore/overview.html#Quotas_and_Limits – kaliatech Aug 02 '11 at 11:17
  • @kaliatech Oops, you're quite right - that change completely missed me, somehow. – Nick Johnson Aug 02 '11 at 12:02
  • 1
    Note that on python the limit is only 10MB if you're moving data from app engine. It's pretty sad you have to jump through such hoops to upload/download large files. I'm transferring data from server to server and opted for breaking it up in chunks rather than dealing with this blobstore url mess. – speedplane Jan 20 '15 at 21:44
  • What if we want to upload more than 2gb? – Oliver Dixon Apr 09 '21 at 11:13
1

When uploading larger files, you can consider the file to be chunked into small sets of requests (should be less than 32MB which is the current limit) that Google App Engine supports.

Check this package with examples - https://github.com/pionl/laravel-chunk-upload

Following is a working code which uses the above package.

View

<div id="resumable-drop" style="display: none">
        <p><button id="resumable-browse" class="btn btn-outline-primary" data-url="{{route('AddAttachments', Crypt::encrypt($rpt->DRAFT_ID))}}" style="width: 100%;
height: 91px;">Browse Report File..</button> 
    </div>

Javascript

 <script>
var $fileUpload = $('#resumable-browse');
var $fileUploadDrop = $('#resumable-drop');
var $uploadList = $("#file-upload-list");

if ($fileUpload.length > 0 && $fileUploadDrop.length > 0) {
    var resumable = new Resumable({
        // Use chunk size that is smaller than your maximum limit due a resumable issue
        // https://github.com/23/resumable.js/issues/51
        chunkSize: 1 * 1024 * 1024, // 1MB
        simultaneousUploads: 3,
        testChunks: false,
        throttleProgressCallbacks: 1,
        // Get the url from data-url tag
        target: $fileUpload.data('url'),
        // Append token to the request - required for web routes
        query:{_token : $('input[name=_token]').val()}
    });

// Resumable.js isn't supported, fall back on a different method
    if (!resumable.support) {
        $('#resumable-error').show();
    } else {
        // Show a place for dropping/selecting files
        $fileUploadDrop.show();
        resumable.assignDrop($fileUpload[0]);
        resumable.assignBrowse($fileUploadDrop[0]);

        // Handle file add event
        resumable.on('fileAdded', function (file) {
            $("#resumable-browse").hide();
            // Show progress pabr
            $uploadList.show();
            // Show pause, hide resume
            $('.resumable-progress .progress-resume-link').hide();
            $('.resumable-progress .progress-pause-link').show();
            // Add the file to the list
            $uploadList.append('<li class="resumable-file-' + file.uniqueIdentifier + '">Uploading <span class="resumable-file-name"></span> <span class="resumable-file-progress"></span>');
            $('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-name').html(file.fileName);
            // Actually start the upload
            resumable.upload();
        });
        resumable.on('fileSuccess', function (file, message) {
            // Reflect that the file upload has completed
            location.reload();
        });
        resumable.on('fileError', function (file, message) {
             $("#resumable-browse").show();
            // Reflect that the file upload has resulted in error
            $('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-progress').html('(file could not be uploaded: ' + message + ')');
        });
        resumable.on('fileProgress', function (file) {
            // Handle progress for both the file and the overall upload
            $('.resumable-file-' + file.uniqueIdentifier + ' .resumable-file-progress').html(Math.floor(file.progress() * 100) + '%');
            $('.progress-bar').css({width: Math.floor(resumable.progress() * 100) + '%'});
        });
    }

}
</script>

Controller

 public function uploadAttachmentAsChunck(Request $request, $id) {
    // create the file receiver
    $receiver = new FileReceiver("file", $request, HandlerFactory::classFromRequest($request));

    // check if the upload is success, throw exception or return response you need
    if ($receiver->isUploaded() === false) {
        throw new UploadMissingFileException();
    }

    // receive the file
    $save = $receiver->receive();

    // check if the upload has finished (in chunk mode it will send smaller files)
    if ($save->isFinished()) {
        // save the file and return any response you need, current example uses `move` function. If you are
        // not using move, you need to manually delete the file by unlink($save->getFile()->getPathname())
        $file = $save->getFile();

        $fileName = $this->createFilename($file);
        // Group files by mime type
        $mime = str_replace('/', '-', $file->getMimeType());
        // Group files by the date (week
        $dateFolder = date("Y-m-W");

        $disk = Storage::disk('gcs');
        $gurl = $disk->put($fileName, $file);

        $draft = DB::table('draft')->where('DRAFT_ID','=', Crypt::decrypt($id))->get()->first();

        $prvAttachments = DB::table('attachments')->where('ATTACHMENT_ID','=', $draft->ATT_ID)->get();
        $seqId = sizeof($prvAttachments) + 1;

        //Save Submission Info
        DB::table('attachments')->insert(
            [   'ATTACHMENT_ID' => $draft->ATT_ID,
                'SEQ_ID' => $seqId,
                'ATT_TITLE' => $fileName,
                'ATT_DESCRIPTION' => $fileName,
                'ATT_FILE' => $gurl
            ]
        );

         return response()->json([
            'path' => 'gc',
            'name' => $fileName,
            'mime_type' => $mime,
            'ff' =>  $gurl
        ]);

       
    }

    // we are in chunk mode, lets send the current progress
    /** @var AbstractHandler $handler */
    $handler = $save->handler();

        return response()->json([
            "done" => $handler->getPercentageDone(),
        ]);
    }

    

    /**
     * Create unique filename for uploaded file
     * @param UploadedFile $file
     * @return string
     */
    protected function createFilename(UploadedFile $file)
    {
        $extension = $file->getClientOriginalExtension();
        $filename = str_replace(".".$extension, "", $file->getClientOriginalName()); // Filename without extension

        // Add timestamp hash to name of the file
        $filename .= "_" . md5(time()) . "." . $extension;

        return $filename;
    }
Kusal Dissanayake
  • 714
  • 14
  • 32
  • 1
    Hi could you provide more details about this. I have a laravel application and I am using Superbalist/laravel-google-cloud-storage package which does not work. An example of how to use this package would be great. – Coola Nov 26 '20 at 21:32
  • 1
    @Coola , Updated the answer with sample working code that I'm already using. – Kusal Dissanayake Nov 29 '20 at 15:01
  • Thanks. This really helps. Will try it out. – Coola Nov 29 '20 at 18:28
0

You can also use blobstore api to directly upload to cloud storage. Blow is the link

https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

upload_url = blobstore.create_upload_url(
  '/upload_handler',
  gs‌​_bucket_name = YOUR.BUCKET_NAME)

template_values = { 'upload_url': upload_url } 
_jinjaEnvironment = jinjaEnvironment.JinjaClass.getJinjaEnvironemtVariable()

if _jinjaEnvironment: 
  template = _jinjaEnvironment.get_template('import.html')

Then in index.html:

<form action="{{ upload_url }}" 
      method="POST" 
      enctype="multipart/form-data">
  Upload File:
  <input type="file" name="file">
</form>
Fred Truter
  • 667
  • 4
  • 10
Karthikkumar
  • 269
  • 3
  • 15
  • Could you please give more details? Would the web UI talk to Cloud storage directly, bypassing GAE? – Nicolas Raoul Mar 25 '16 at 01:58
  • Appengine -upload_url=blobstore.create_upload_url('/upload_handler',gs_bucket_name = YOUR.BUCKET_NAME) template_values={ 'upload_url': upload_url } _jinjaEnvironment = jinjaEnvironment.JinjaClass.getJinjaEnvironemtVariable() if(_jinjaEnvironment): template = _jinjaEnvironment.get_template("import.html") – Karthikkumar Mar 25 '16 at 06:04
  • index.html-
    Upload File:
    – Karthikkumar Mar 25 '16 at 06:07
  • Yes the UI can directly upload the cloud storage, let me know if the above code helps. – Karthikkumar Mar 25 '16 at 06:09
  • Would you mind putting the code into your answer? Comments tend to get deleted. Thanks! :-) – Nicolas Raoul Mar 25 '16 at 06:15