What I'm Trying to Do
Upload a PDF file from a browser client without exposing any credentials or anything unsavory. Based on this, I thought it could be done, but it doesn't seem to work for me.
The premise is:
you request a pre-signed URL from an S3 Bucket based on a set of parameters supplied to a function that is part of the JavaScript AWS SDK
you supply this URL to the frontend, which can use it to place a file in the S3 Bucket without needing to use any credentials or authentication on the frontend.
GET a Pre-Signed URL From S3
This part is simple and it works for me. I just request a URL from S3 with this little JS nugget:
const s3Params = {
Bucket: uploadBucket,
Key: `${fileId}.pdf`,
ContentType: 'application/pdf',
Expires: 60,
ACL: 'public-read',
}
let uploadUrl = s3.getSignedUrl('putObject', s3Params);
Use the Pre-Signed URL to Upload a File to S3
This is the part that doesn't work, and I can't figure out why. This little chunk of code basically sends a blob of data to the S3 bucket pre-signed URL using a PUT request.
const result = await fetch(response.data.uploadURL, {
method: 'put',
body: blobData,
});
PUT or POST?
I've found that using any POST requests results in 400 Bad Request
, so PUT it is.
What I've Looked At
Content-Type (in my case, it'd be application/pdf, so blobData.type
) -- they match between the backend and frontend.
Similar use case. Looking at this one, it appears that no headers need to be supplied in the PUT request and the signed URL itself is all that is necessary for the file upload.
Something weird that I don't understand. It looks like I may need to pass the length and type of the file to the getSignedUrl
call to S3.
Exposing my Bucket to the public (no bueno)
Frontend (fileUploader.js, using Vue):
...
uploadFile: async function(e) {
/* receives file from simple input element -> this.file */
// get signed URL
const response = await axios({
method: 'get',
url: API_GATEWAY_URL
});
console.log('upload file response:', response);
let binary = atob(this.file.split(',')[1]);
let array = [];
for (let i = 0; i < binary.length; i++) {
array.push(binary.charCodeAt(i));
}
let blobData = new Blob([new Uint8Array(array)], {type: 'application/pdf'});
console.log('uploading to:', response.data.uploadURL);
console.log('blob type sanity check:', blobData.type);
const result = await fetch(response.data.uploadURL, {
method: 'put',
headers: {
'Access-Control-Allow-Methods': '*',
'Access-Control-Allow-Origin': '*',
'x-amz-acl': 'public-read',
'Content-Type': blobData.type
},
body: blobData,
});
console.log('PUT result:', result);
this.uploadUrl = response.data.uploadURL.split('?')[0];
}
Backend (fileReceiver.js):
'use strict';
const uuidv4 = require('uuid/v4');
const aws = require('aws-sdk');
const s3 = new aws.S3();
const uploadBucket = 'the-chumiest-bucket';
const fileKeyPrefix = 'path/to/where/the/file/should/live/';
const getUploadUrl = async () => {
const fileId = uuidv4();
const s3Params = {
Bucket: uploadBucket,
Key: `${fileId}.pdf`,
ContentType: 'application/pdf',
Expires: 60,
ACL: 'public-read',
}
return new Promise((resolve, reject) => {
let uploadUrl = s3.getSignedUrl('putObject', s3Params);
resolve({
'statusCode': 200,
'isBase64Encoded': false,
'headers': {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': '*',
'Access-Control-Allow-Credentials': true,
},
'body': JSON.stringify({
'uploadURL': uploadUrl,
'filename': `${fileId}.pdf`
})
});
});
};
exports.handler = async (event, context) => {
console.log('event:', event);
const result = await getUploadUrl();
console.log('result:', result);
return result;
}
Serverless config (serverless.yml):
service: ocr-space-service
provider:
name: aws
region: ca-central-1
stage: ${opt:stage, 'dev'}
timeout: 20
plugins:
- serverless-plugin-existing-s3
- serverless-step-functions
- serverless-pseudo-parameters
- serverless-plugin-include-dependencies
layers:
spaceOcrLayer:
package:
artifact: spaceOcrLayer.zip
allowedAccounts:
- "*"
functions:
fileReceiver:
handler: src/node/fileReceiver.handler
events:
- http:
path: /doc-parser/get-url
method: get
cors: true
startStateMachine:
handler: src/start_state_machine.lambda_handler
role:
runtime: python3.7
layers:
- {Ref: SpaceOcrLayerLambdaLayer}
events:
- existingS3:
bucket: ingenio-documents
events:
- s3:ObjectCreated:*
rules:
- prefix:
- suffix: .pdf
startOcrSpaceProcess:
handler: src/start_ocr_space.lambda_handler
role:
runtime: python3.7
layers:
- {Ref: SpaceOcrLayerLambdaLayer}
parseOcrSpaceOutput:
handler: src/parse_ocr_space_output.lambda_handler
role:
runtime: python3.7
layers:
- {Ref: SpaceOcrLayerLambdaLayer}
renamePdf:
handler: src/rename_pdf.lambda_handler
role:
runtime: python3.7
layers:
- {Ref: SpaceOcrLayerLambdaLayer}
parseCorpSearchOutput:
handler: src/node/pdfParser.handler
role:
runtime: nodejs10.x
saveFileToProcessed:
handler: src/node/saveFileToProcessed.handler
role:
runtime: nodejs10.x
stepFunctions:
stateMachines:
ocrSpaceStepFunc:
name: ocrSpaceStepFunc
definition:
StartAt: StartOcrSpaceProcess
States:
StartOcrSpaceProcess:
Type: Task
Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:#{AWS::StackName}-startOcrSpaceProcess"
Next: IsDocCorpSearchChoice
Catch:
- ErrorEquals: ["HandledError"]
Next: HandledErrorFallback
IsDocCorpSearchChoice:
Type: Choice
Choices:
- Variable: $.docIsCorpSearch
NumericEquals: 1
Next: ParseCorpSearchOutput
- Variable: $.docIsCorpSearch
NumericEquals: 0
Next: ParseOcrSpaceOutput
ParseCorpSearchOutput:
Type: Task
Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:#{AWS::StackName}-parseCorpSearchOutput"
Next: SaveFileToProcessed
Catch:
- ErrorEquals: ["SqsMessageError"]
Next: CorpSearchSqsErrorFallback
- ErrorEquals: ["DownloadFileError"]
Next: CorpSearchDownloadFileErrorFallback
- ErrorEquals: ["HandledError"]
Next: HandledNodeErrorFallback
SaveFileToProcessed:
Type: Task
Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:#{AWS::StackName}-saveFileToProcessed"
End: true
ParseOcrSpaceOutput:
Type: Task
Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:#{AWS::StackName}-parseOcrSpaceOutput"
Next: RenamePdf
Catch:
- ErrorEquals: ["HandledError"]
Next: HandledErrorFallback
RenamePdf:
Type: Task
Resource: "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:#{AWS::StackName}-renamePdf"
End: true
Catch:
- ErrorEquals: ["HandledError"]
Next: HandledErrorFallback
- ErrorEquals: ["AccessDeniedException"]
Next: AccessDeniedFallback
AccessDeniedFallback:
Type: Fail
Cause: "Access was denied for copying an S3 object"
HandledErrorFallback:
Type: Fail
Cause: "HandledError occurred"
CorpSearchSqsErrorFallback:
Type: Fail
Cause: "SQS Message send action resulted in error"
CorpSearchDownloadFileErrorFallback:
Type: Fail
Cause: "Downloading file from S3 resulted in error"
HandledNodeErrorFallback:
Type: Fail
Cause: "HandledError occurred"
Error:
403 Forbidden
PUT Response
Response {type: "cors", url: "https://{bucket-name}.s3.{region-id}.amazonaw…nedHeaders=host%3Bx-amz-acl&x-amz-acl=public-read", redirected: false, status: 403, ok: false, …} body: (...) bodyUsed: false headers: Headers {} ok: false redirected: false status: 403 statusText: "Forbidden" type: "cors" url: "https://{bucket-name}.s3.{region-id}.amazonaws.com/actionID.pdf?Content-Type=application%2Fpdf&X-Amz-Algorithm=SHA256&X-Amz-Credential=CREDZ-&X-Amz-Date=20190621T192558Z&X-Amz-Expires=900&X-Amz-Security-Token={token}&X-Amz-SignedHeaders=host%3Bx-amz-acl&x-amz-acl=public-read" proto: Response
What I'm Thinking
I'm thinking the parameters supplied to the getSignedUrl
call using the AWS S3 SDK aren't correct, though they follow the structure suggested by AWS' docs (explained here). Aside from that, I'm really lost as to why my request is rejected. I've even tried exposing my Bucket to the public fully and it still didn't work.
Edit
#1:
After reading this, I tried to structure my PUT request like this:
let authFromGet = response.config.headers.Authorization;
const putHeaders = {
'Authorization': authFromGet,
'Content-Type': blobData,
'Expect': '100-continue',
};
...
const result = await fetch(response.data.uploadURL, {
method: 'put',
headers: putHeaders,
body: blobData,
});
this resulted in a 400 Bad Request
instead of a 403; different, but still wrong. It's apparent that putting any headers on the request is wrong.