5

Context

I feel like I need to provide a lot of context here for the full scope of the problem to be understood, so apologies in advance if this goes a bit long or ends up providing too much information, I just want to ward off as many follow-up questions and clarification requests as I can.

I've got a project that's a tech handoff. It used to be in production under a different owner in a different AWS account. I'm trying to re-launch it in an AWS account I control and one of the packages is creating some problems for me.

It uses Serverless to provision a couple S3 buckets and their access policies, a couple IAM roles, and a bunch of ApiGateway methods. The package relies on nested stacks to get around the 200 resource limit as described here.

Finally, the IAM user that CircleCI connects as has the AdministratorAccess policy attached.

Problem

I keep getting failures from CircleCI during this step in the build

node_modules/.bin/serverless deploy --verbose --stage develop --region us-east-1 --package ./.serverless

The exact nature of the failure seems to be inconsistent i.e., it doesn't always fail at the same spot. At some point a resource just fails to create and the whole process rolls back. Here are a couple examples of run failures in the log with +/- 5 lines, followed by the actual error reported by Serverless

Run 1

CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod001VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod002VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod003VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod004VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod006Options
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncAbcNestedStack
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncDefNestedStack
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncGhiNestedStack
CloudFormation - UPDATE_ROLLBACK_IN_PROGRESS - AWS::CloudFormation::Stack - org-package-develop
CloudFormation - UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS - AWS::CloudFormation::Stack - org-package-develop
CloudFormation - DELETE_IN_PROGRESS - AWS::ApiGateway::Method - ApiGatewayMethod006Options
  Serverless Error ---------------------------------------

  An error occurred: FuncAbcNestedStack - Embedded stack arn:aws:cloudformation:us-east-1:ACCOUNT_ID:stack/org-package-develop-FuncAbcNestedStack/RESOURCE-ID-001 was not successfully created: The following resource(s) failed to create: [AbcLambdaFunction]. .

Run 2

CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod001VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod002VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod005VarOptions
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod006Options
CloudFormation - CREATE_COMPLETE - AWS::ApiGateway::Method - ApiGatewayMethod004VarOptions
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncDefNestedStack
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncGhiNestedStack
CloudFormation - CREATE_FAILED - AWS::CloudFormation::Stack - FuncAbcNestedStack
CloudFormation - UPDATE_ROLLBACK_IN_PROGRESS - AWS::CloudFormation::Stack - org-package-develop
CloudFormation - UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS - AWS::CloudFormation::Stack - org-package-develop
CloudFormation - DELETE_IN_PROGRESS - AWS::ApiGateway::Method - ApiGatewayMethod001VarOptions
  Serverless Error ---------------------------------------

  An error occurred: FuncDefNestedStack - Embedded stack arn:aws:cloudformation:us-east-1:ACCOUNT_ID:stack/org-package-develop-FuncDefNestedStack/RESOURCE-ID-002 was not successfully created: The following resource(s) failed to create: [DefLambdaFunction]. .

Note: All the unique identifiers in the above logs have been replaces/obfuscated by new identifiers which are unique across both logs, not per log i.e., FuncAbcNestedStack appears in both logs because it exact same resource in the configuration.

Question

Given all the above, my question at this point is how do I debug this? This represents all the detail I (believe) is available to me in that I can't dive deeper to find out why a resource failed to create. I've read a bit about troubleshooting errors but nothing there has been terribly helpful since I'm not actually using EC2 directly.

Apr 4 Update

I've done a good amount of work trying to debug the templates. Mind you, I'm generally not working with the templates themselves, Serverless generates them and dumps them into an S3 bucket before they're applied.

Here are some steps I've taken

  1. Updated to most recent version of Serverless (1.67.0, from 1.30.3)
  2. Nuked existing stacks
  3. Nuked related S3 bucket
  4. Updated node runtime (12.16.1, from 8.10.0)
  5. Downloaded and linted the CFN template that contains the failing lambda - no issues reported

I'm still getting the same results. When I re-run the build and check the CloudFormation event logs, I do see that a stack fails to create because a Lambda function within it fails to create. There's nothing special about this function (other Lambdas create successfully earlier in the run) other than the fact that it's the authorizer for every other function in the API, which may or may not be significant. I still can't find further detail as to why the lambda fails to create.

Apr 6 Update

Ok, now that I understand how the CloudFormation console works, here is now what I think is the undermost-lying error message

Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: 0507722d-46e7-4340-bc68-fdba1ed469d6)

Looking at the CFN template itself for this nested stack, I now see what is going on. Every single lamba gets its own stack, and each stack across the entire package is compiled into a single ZIP file, whose size ends up being about 270MB or 20MB larger than the limit specified above. From this point, it seems like I have two possible paths forward

  1. Figure out how to split the functions across multiple ZIPs
  2. Change the webpack configuration so the compiled files are less bloated (I seriously don't know what's going on here - a 1k TypeScript file is coming out as 6.5MB after webpack)
Peter Bailey
  • 105,256
  • 31
  • 182
  • 206
  • 1
    Why don't you login to the AWS console, head over to the cloud formation section --> select your stack and click on Events in the bottom pane. That should actually tell you the error you're facing – Anshul Verma Mar 30 '20 at 20:47
  • That, unfortunately, provides no additional detail that is not captured in the logs I included above. – Peter Bailey Mar 30 '20 at 20:53
  • 1
    I agree with the answer by Pat Myron here. @Peter Bailey Did you try to look into the events of the nested stack?? – Martin Löper Apr 03 '20 at 00:25
  • 1
    Is it possible to include any templates, nested stack events, or CloudWatch logs as well? – Pat Myron Apr 04 '20 at 22:09
  • 1
    @PatMyron Yeah, I can't just now but later today/tonight. I'll post another comment when that's ready – Peter Bailey Apr 04 '20 at 22:23
  • 1
    @PatMyron Here's what I have available so for in terms of extra logging. I'm keeping it separate from the question because I don't want to include the un-obfuscated data as part of the question's history. https://gist.github.com/baileyp/f4df7058e0b8982ba6bdba7061a6db7f – Peter Bailey Apr 05 '20 at 00:46
  • That stack event is from the parent stack. The stack events for the nested stack itself will be more detailed. Even if the nested stack was recently deleted, its events can still be viewed at https://console.aws.amazon.com/cloudformation/home#/stacks/events?stackId=$ARN for some period of time – Pat Myron Apr 05 '20 at 02:21
  • Yes! @PeterBailey please click on the AuthorizerFuncNestedStack in CloudFormation and paste the first failed event from there. It contains the reason why that specific nested stack failed. – Martin Löper Apr 06 '20 at 14:55
  • @MartinLöper I think I'm missing something here. When I go to view CloudFormation > Stacks, there are only two listed (the parent-most stack for this package, and the parent-most stack for another package). Even with the "View nested" toggle switch on, that's all I see. I'm beginning to suspect that the stack splitting/nesting is not actually working. (Disclosure: Prior to this effort, I have zero experience with CFN) https://www.dropbox.com/s/6zdlwm8jrye34ew/stack-log.png?dl=0 – Peter Bailey Apr 06 '20 at 16:54
  • 1
    I think the stack could be already deleted. Did you try to switch the dropdown which says "Active" on your screenshot? You should search under "Deleted" and/or "Failed". see: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-view-deleted-stacks.html – Martin Löper Apr 06 '20 at 17:03
  • @MartinLöper Yes! That's it! Thank you. I have posted an update in the original question. Now excuse me while I go research WebPack – Peter Bailey Apr 06 '20 at 17:37
  • @PeterBailey try something like this: https://stackoverflow.com/a/58884131/4122849 – Pat Myron Apr 07 '20 at 06:46

1 Answers1

2

You'll need to look at the nested stacks themselves. The AbcLambdaFunction and DefLambdaFunction resources should have more detailed failure stack events in the nested stacks than in the parent stacks. You'll likely need to fix AbcLambdaFunction and DefLambdaFunction in the nested stack templates, as the inconsistency is likely just due to whichever resource happened to fail first and started the rollback

If it's been a while since those templates have been run, it's likely Lambda Runtimes have been deprecated. The CloudFormation Linter should be able to check your templates for this and more possibilities

AWS Lambda limits are likely as well, I'd recommend trying things like this

Check to see if there any CloudWatch logs as well

Pat Myron
  • 4,437
  • 2
  • 20
  • 39
  • Thank you for this. I've made a few updates but still seem to be banging my head against the same problem. Any further insight would be very much appreciated! – Peter Bailey Apr 04 '20 at 21:52
  • 1
    Oh, forgot to mention, my findings have been documented above in the original question – Peter Bailey Apr 04 '20 at 21:58