Difference between \A \z and ^ $ in Ruby regular expressions

Question

In the documentation I read:

Use \A and \z to match the start and end of the string, ^ and $ match the start/end of a line.

I am going to apply a regular expression to check username (or e-mail is the same) submitted by user. Which expression should I use with validates_format_of in model? I can't understand the difference: I've always used ^ and $ ...

http://guides.rubyonrails.org/security.html#regular-expressions — Ivan Chau, Aug 28 '16 at 02:48

score 268 · Accepted Answer · edited Oct 04 '13 at 19:05

268

If you're depending on the regular expression for validation, you always want to use \A and \z. ^ and $ will only match up until a newline character, which means they could use an email like me@example.com\n<script>dangerous_stuff();</script> and still have it validate, since the regex only sees everything before the \n.

My recommendation would just be completely stripping new lines from a username or email beforehand, since there's pretty much no legitimate reason for one. Then you can safely use EITHER \A \z or ^ $.

edited Oct 04 '13 at 19:05

akhanubis

4,202
1
27
19

answered Feb 23 '09 at 13:43

Luke

4,381
1
21
15

14

@Ragmaanir is right, it should be with small letter `\z` instead of `\Z`! – Petr Aug 22 '12 at 10:35
13

+1 Thanks! Although I would have to disagree with your recommendation: A) Don't add unnecessary work/processing if there's an appropriate catch-all, and B) especially not if it allows you to remain lazy about distinguishing between the two. You may not always be in a position to string manipulate, only to Regex, so commit the right one to memory and know the difference! – dooleyo Mar 25 '14 at 18:01
2

I didn't understand the example with dangerous stuff because in either cases one could include dangerous stuff in the string, with or without new lines it would be an exploit that should be fixed with html sanitizing and validation. – Jayr Motta Dec 02 '14 at 18:24
3

@JayrMotta what the demonstration shows is that the dangerous stuff would *completely bypass your entire regex check*. So even if you were checking for dangerous stuff in your regex, it would get bypassed if you used `$` to check for "end of string" instead of `\z`. – Doctor Blue Sep 09 '16 at 11:38

score 203 · Answer 2 · edited Dec 27 '14 at 07:44

According to Pickaxe:

^ Matches the beginning of a line.

$ Matches the end of a line.

\A Matches the beginning of the string.

\z Matches the end of the string.

\Z Matches the end of the string unless the string ends with a "\n", in which case it matches just before the "\n".

So, use \A and lowercase \z. If you use \Z someone could sneak in a newline character. This is not dangerous I think, but might screw up algorithms that assume that there's no whitespace in the string. Depending on your regex and string-length constraints someone could use an invisible name with just a newline character.

JavaScript's implementation of Regex treats \A as a literal 'A' (ref). So watch yourself out there and test.

score 24 · Answer 3 · edited Feb 02 '15 at 06:32

24

Difference By Example

/^foo$/ matches any of the following, /\Afoo\z/ does not:

whatever1
foo
whatever2

foo
whatever2

whatever1
foo

/^foo$/ and /\Afoo\z/ all match the following:

foo

edited Feb 02 '15 at 06:32

shivam

16,048
3
56
71

answered Aug 20 '13 at 20:20

Chun Yang

2,451
23
16

score 20 · Answer 4 · answered Feb 23 '09 at 13:44

The start and end of a string may not necessarily be the same thing as the start and end of a line. Imagine if you used the following as your test string:

my
name
is
Andrew

Notice that the string has many lines in it - the ^ and $ characters allow you to match the beginning and end of those lines (basically treating the \n character as a delimeter) while \A and \Z allow you to match the beginning and end of the entire string.

Difference between \A \z and ^ $ in Ruby regular expressions

4 Answers4

Linked