2

I am new to sagemaker, and am hoping to use sagemaker in a VPC with a private subnet, so data accessed from s3 is not exposed to public internet.

I have created a vpc with a private subnet (no internet or nat gateway), and have attached a vpc s3 gateway endpoint - with this, can I apply the subnet's default security group settings to the sagemaker notebook instances? ..or are some additional configurations to this required?

Also, I'm hoping to keep internet access for the sagemaker notebook instance, so I can still download python packages (but just wanting to ensure data read from s3 using the private subnet is all okay with its default security group)

Thank you

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Marley
  • 121
  • 1
  • 9
  • You have to provide your SG settings. What is your current SG? What are the route tables of your subnets? – Marcin Nov 29 '21 at 02:29
  • Thanks John, much appreciated. Ive associated the private subnet to the route table, which therefore has the 2 rules: The first (default) route has the VPC CIDR for destination and `local` for Target. The second route has destination `com.amazonaws.us-east-1.s3` and the vpc s3 gateway endpoint id as target (this rule was added to the RT when creating the vpc s3 endpoint). The SG is just the default VPC SG (with `all traffic` for inbound/outbound rules). Please excuse if Im way off as I am quite new to this, but should it matter if the subnet is private with no internet/nat gateway attached? – Marley Nov 29 '21 at 12:36

1 Answers1

1

From the setup you've described, it looks like you're on the right path. Your private subnet will not have direct access to the internet, which is what you want. By setting up a VPC endpoint for S3, you can make sure that traffic to S3 from your SageMaker instances does not go out over the public internet, increasing security.

As for the security group settings, the default security group which allows all outbound traffic should work fine for your use case. This will allow your SageMaker instances to communicate with S3.

For downloading Python packages, you'll need internet access, but your private subnet does not have a route to the internet. You'll need a NAT gateway or a NAT instance for this, which should be placed in a public subnet, and that public subnet, by definition, needs an internet gateway.

You would then need to add a route to the main route table (or whichever is associated with your private subnet) to route outbound traffic to the NAT gateway. Remember, a NAT gateway allows instances in a private subnet to connect to the internet (or other AWS services), but prevent the internet from initiating a connection with those instances.

Please note that while this setup increases security, it also increases complexity. You will need to maintain the NAT gateway and ensure that the security group rules allow the necessary traffic.

Remember to consider additional data transfer costs associated with using a NAT gateway.

Finally, any one reading this in 2023 or later, please consider using SageMaker Studio Notebooks instead of Notebook Instances. SageMaker Studio provides a fully integrated development environment with significantly more features and capabilities, such as real-time collaboration, system and model metrics visualization, and automated machine learning experiments, compared to traditional SageMaker notebook instances.

Yann Stoneman
  • 953
  • 11
  • 35