20

I'm trying to create an s3 bucket through cloudformation. I tried using regex ^([0-9a-z.-]){3,63}$, but it also accepts the patterns "..." and "---" which are invalid according to new s3 naming conventions. (Ref: https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html) Please help?

  • 1
    The rules look complex and messy to me. Why do you need to validate S3 bucket names? Are you allowing your users to create buckets directly? – Tim Biegeleisen May 23 '18 at 06:10
  • what names are allowed? 3-63 characters [0-9a-z.-] except just two names, ... and --- ? Are names like .., ...., --, ---, ..- allowed? – user31264 May 23 '18 at 06:20
  • @user31264 Names should start and end with a lowercase letter or a number. You can use hyphens in between –  May 23 '18 at 06:40
  • @TimBiegeleisen yes, Users are creating buckets using cloudformation. –  May 23 '18 at 06:41
  • Edit your question and include the regex rules you want, _directly in your question_, not via links. – Tim Biegeleisen May 23 '18 at 06:57
  • 1
    @FellowBeginner note that eveb though bucket names are allowed to contain dots, I would strongly advise against it. There are a number of "gotchas" involving dots in bucket names, including the inability to enable S3 Transfer Acceleration on the bucket, and HTTPS certificate issues that are easily avoided if you simply don't use dots. If you are letting other users make up bucket names when launching stacks, they may be unaware of those quirks. – Michael - sqlbot May 23 '18 at 20:24
  • I think one of the constraints on bucket names, is so that they can be used to host static websites, hence the naming must be able to appear in a subdomain prefix, therefore must conform to the relevant RFC. – MikeW Jul 24 '20 at 07:24

10 Answers10

32

Answer

The simplest and safest regex is:

(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$

It ensures that names work for all cases - including when you are using S3 Transfer Acceleration. Also, as it doesn't include any backslashes, it's easier to use in string contexts.

Alternative

If you need S3 bucket names that include dots (and you don't use S3 Transfer Acceleration), you can use this instead:

(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$

Explanation

The Amazon S3 bucket naming rules as of 2022-05-14 are:

  1. Bucket names must be between 3 (min) and 63 (max) characters long.
  2. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
  3. Bucket names must begin and end with a letter or number.
  4. Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
  5. Bucket names must not start with the prefix xn--.
  6. Bucket names must not end with the suffix -s3alias.
  7. Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names.

This regex matches all the rules (including rule 7):

(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$

The first group (?!(^xn--|-s3alias$)) is a negative lookahead that ensures that the name doesn't start with xn-- or end with -s3alias (satisfying rules 5 and 6).

The rest of the expression ^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$ ensures that:

  • the name starts with a lowercase letter or number (^[a-z0-9]) and ends with a lowercase letter or number ([a-z0-9]$) (rule 3).
  • the rest of the name consists of 1 to 61 lowercase letters, numbers or hyphens ([a-z0-9-]{1,61}) (rule 2).
  • the entire expression matches names from 3 to 63 characters in length (rule 1).

Lastly, we don't need to worry about rule 4 (which forbids names that look like IP addresses) because rule 7 implicitly covers this by forbidding dots in names.

If you do not use Amazon S3 Transfer Acceleration and want to permit more complex bucket names, then you can use this more complicated regular expression:

(?!(^((2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})\.){3}(2(5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})$|^xn--|.+-s3alias$))^[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$

The main change is the addition of the expression to match IPv4 addresses (while the spec simply says that bucket names must not be formatted as IP addresses, as IPv6 addresses contain colons, they are already forbidden by rule 2.)

Zak
  • 1,042
  • 6
  • 12
  • Hey, Thanks for the response. A slight problem here is that when I try and escape backslashes here (I am using JSON), it doesnt accept patterns like a.b.c. I used the following pattern: (?=^.{3,63}$)(?!^(\\d+\\.?)+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])$) –  May 28 '18 at 09:07
  • Also, it doesnt match pattern like 1000 (basically any names comprising of just digits) –  May 28 '18 at 09:55
  • No worries. The regex works when tested here: https://regex101.com/r/iPX9o6/1 - so, I would guess that it's an escaping issue. Perhaps try if `^\\.$` matches a single `.` and doesn't match anything else? – Zak May 28 '18 at 10:02
  • @FellowBeginner I've update the negative lookahead `(?!^(\d+\.)+\d+$)` to allow for plain sequences of digits. – Zak May 29 '18 at 07:10
  • 3
    It seems not working for No.6. See https://regex101.com/r/BYxCPV/1. – Dust break Aug 19 '22 at 12:46
5

The following regex fulfils the AWS specifications provided the fact you don't want to allow . in the bucket name (which is a recommendation, otherwise Transfer Acceleration can't be enabled):

^((?!xn--)(?!.*-s3alias$)[a-z0-9][a-z0-9-]{1,61}[a-z0-9])$

This one is good because it allows to be incorporated in more complex checks simply replacing ^ and $ with other strings, thus allowing for ARN checks and so on.

EDIT: added -s3alias exclusion as per the comment by @ryanjdillon

drAlberT
  • 22,059
  • 5
  • 34
  • 40
  • This regex does not allow for dots ., and it does allow for a suffix of -s3alias. An updated version: `^(?P(?!xn--)(?!.*-s3alias)[a-z0-9][a-z0-9.-]{1,62}[a-z0-9])$`. Per the [AWS bucket naming rules](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html). – ryanjdillon Sep 14 '21 at 15:05
  • 1
    In fact I wrote that its INTENT is not allowing dots ... as they are DEPRECATED. +1 for -s3alias (which I guess has been introduced recently) – drAlberT Oct 20 '21 at 08:44
4

I've adapted Zak's answer a little bit. I found it was a little too complicated and threw out valid domain names. Here's the new regex (available with tests on regex101.com**):

(?!^(\d{1,3}\.){3}\d{1,3}$)(^[a-z0-9]([a-z0-9-]*(\.[a-z0-9])?)*$)

The first part is the negative lookahead (?!^(\d{1,3}\.){3}\d{1,3}$), which only matches valid IP addresses. Basically, we try to match 1-3 numbers followed by a period 3 times (\d{1,3}\.){3}) followed by 1-3 numbers (\d{1,3}).

The second part says that the name must start with a lowercase letter or a number (^[a-z0-9]) followed by lowercase letters, numbers, or hyphens repeated 0 to many times ([a-z0-9-]*). If there is a period, it must be followed by a lowercase letter or number ((\.[a-z0-9])?). These last 2 patterns are repeated 0 to many times (([a-z0-9-]*(\.[a-z0-9])?)*).

The regex does not attempt to enforce the size restrictions set forth by AWS (3-63 characters). That can either be handled by another regex (.{3,6}) or by checking the size of the string.


** At that link, one of the tests I added are failing, but if you switch to the test area and type in the same pattern, it passes. It also works if you copy/paste it into the terminal, so I assume that's a bug on the regex101.com side.

c1moore
  • 1,827
  • 17
  • 27
  • 2
    I modified @c1moore answer a bit to include the condition: The bucket name cannot contain underscores, **end with a dash**, have consecutive periods, or use dashes adjacent to periods. https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-s3-bucket-naming-requirements.html ```(?!^(\d{1,3}\.){3}\d{1,3}$)(^[a-z0-9]([a-z0-9-]*(\.[a-z0-9])?)*$(?<!\-))``` – Jophine Dec 12 '19 at 10:26
  • this one seems to allow the dash in the end of the name – hellmean Jul 01 '20 at 08:41
  • 1
    You're right, nice catch. I'm not sure if that was a requirement when this regex was created. It looks like @jophine has a solution for that, though. – c1moore Jul 01 '20 at 12:21
3

Regular expression for S3 Bucket Name:

String S3_REPORT_NAME_PATTERN = "[0-9A-Za-z!\\-_.*\'()]+";

String S3_PREFIX_PATTERN   = "[0-9A-Za-z!\\-_.*\\'()/]*";

String S3_BUCKET_PATTERN = "(?=^.{3,63}$)(?!^(\\d+\\.)+\\d+$)(^(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])$)";
ManojP
  • 6,113
  • 2
  • 37
  • 49
1

I used @Zak regex but it isn't 100% correct. I used this for all rules for AWS bucket name. I make validation step by step so it looks like this:

  • Bucket names must be at least 3 and no more than 63 characters long -> ^.{3,63}$
  • Bucket names must not contain uppercase characters or underscores -> [A-Z_]
  • Bucket names must start with a lowercase letter or number -> ^[a-z0-9]
  • Bucket names must not be formatted as an IP address (for example, 192.168.5.4) ->^(\d+\.)+\d+$. That is more restricted then AWS.
  • Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.) -> In python if ".." in bucket_name:
  • .. Each label must end with a lowercase letter or a number ->^(.*[a-z0-9]\.)*.*[a-z0-9]$
Sociopath
  • 13,068
  • 19
  • 47
  • 75
FilipShark
  • 11
  • 1
1
var bucketRGEX =  new RegExp(/(?=^.{3,63}$)/);
var bucketRGEX1 =  new RegExp(/(?!^(\d+\.)+\d+$)/);
var bucketRGEX2 =  new RegExp(/(^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$)/);
var result = bucketRGEX.test(bucketName);
var result1 = bucketRGEX1.test(bucketName);
var result2 = bucketRGEX2.test(bucketName);
console.log('bucketName '+bucketName +' result '+result);
console.log('bucketName '+bucketName +' result1 '+result1);
console.log('bucketName '+bucketName +' result 2 '+result2);

if(result && result1 && result2)
{
  //condition pass
}
else
{
    //not valid bucket name
}  
Sunil Jakhar
  • 219
  • 2
  • 9
1

AWS issued new guidelines where '.' is considered not recommended and bucket names starting with 'xn--' are now prohibited (https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html). If you disallow '.' the regex becomes much more readable:

(?=^.{3,63}$)(?!xn--)([a-z0-9](?:[a-z0-9-]*)[a-z0-9])$

hellmean
  • 121
  • 10
1

I tried passing a wrong bucket name to the S3 API itself to see how it validates. Looks like the following are valid regex patterns as returned in the API response.

Bucket name must match the regex

^[a-zA-Z0-9.\-_]{1,255}$

or be an ARN matching the regex

^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$
Soundararajan
  • 2,000
  • 21
  • 23
0

Edit: Modified the regexp to allow required size (3-63) and add some other options.

The names must be DNS-compliant, so you could try with:

^[A-Za-z0-9][A-Za-z0-9\-]{1,61}[A-Za-z0-9]$

See: https://regexr.com/3psne

Use this if you need to use periods:

^[A-Za-z0-9][A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$

See: https://regexr.com/3psnb

Finally, if you want to disallow two consecutive 'non-word' characters, you can use:

^[A-Za-z0-9](?!.*[.-]{2})[A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$

See: https://regexr.com/3psn8

Based on: Regexp for subdomain

Julio
  • 5,208
  • 1
  • 13
  • 42
  • This is pretty far off from the AWS documentation. For example, this does not even allow period separators. – Tim Biegeleisen May 23 '18 at 06:47
  • We also need the length constraint between 3 to 64 characters. how can that be achieved as well –  May 23 '18 at 06:52
  • @TimBiegeleisen if we can ignore period seperators for now, can we have a basic regex for the other rules –  May 23 '18 at 06:54
  • @FellowBeginner I modified the regexp for allowing size of 3-63. If you also need periods you could use: "^[A-Za-z0-9][A-Za-z0-9\-.]{1,61}[A-Za-z0-9]$" However, that would allow things like 'foo..bar'. I don't know if that should be allowed or not. – Julio May 23 '18 at 07:25
  • @FellowBeginner I also added some other options – Julio May 23 '18 at 07:41
0

If you do not use transfer accelration, you can simply remove the period option. This code accounts for all the rules, including

  • no double dots
  • no slash/doto combo in a ow
  • no dot/slash combo

it also allows for you to put a trailing slash at the bucket name or not, depending if you want your user to do so

(?!^|xn--)[a-z0-9]{1}[a-z0-9-.]{1,61}[a-z0-9]{1}(?<!-s3alias|\.\..{1,63}|.{1,63}\.\.|\.\-.{1,63}|.{1,63}\.\-|\-\..{1,63}|.{1,63}\-\.)(?=$|\/)