230

In the documentation I read:

Use \A and \z to match the start and end of the string, ^ and $ match the start/end of a line.

I am going to apply a regular expression to check username (or e-mail is the same) submitted by user. Which expression should I use with validates_format_of in model? I can't understand the difference: I've always used ^ and $ ...

Daniel X Moore
  • 14,637
  • 17
  • 80
  • 92
collimarco
  • 34,231
  • 36
  • 108
  • 142

4 Answers4

268

If you're depending on the regular expression for validation, you always want to use \A and \z. ^ and $ will only match up until a newline character, which means they could use an email like me@example.com\n<script>dangerous_stuff();</script> and still have it validate, since the regex only sees everything before the \n.

My recommendation would just be completely stripping new lines from a username or email beforehand, since there's pretty much no legitimate reason for one. Then you can safely use EITHER \A \z or ^ $.

akhanubis
  • 4,202
  • 1
  • 27
  • 19
Luke
  • 4,381
  • 1
  • 21
  • 15
  • 14
    @Ragmaanir is right, it should be with small letter `\z` instead of `\Z`! – Petr Aug 22 '12 at 10:35
  • 13
    +1 Thanks! Although I would have to disagree with your recommendation: A) Don't add unnecessary work/processing if there's an appropriate catch-all, and B) especially not if it allows you to remain lazy about distinguishing between the two. You may not always be in a position to string manipulate, only to Regex, so commit the right one to memory and know the difference! – dooleyo Mar 25 '14 at 18:01
  • 2
    I didn't understand the example with dangerous stuff because in either cases one could include dangerous stuff in the string, with or without new lines it would be an exploit that should be fixed with html sanitizing and validation. – Jayr Motta Dec 02 '14 at 18:24
  • 3
    @JayrMotta what the demonstration shows is that the dangerous stuff would *completely bypass your entire regex check*. So even if you were checking for dangerous stuff in your regex, it would get bypassed if you used `$` to check for "end of string" instead of `\z`. – Doctor Blue Sep 09 '16 at 11:38
203

According to Pickaxe:

^ Matches the beginning of a line.

$ Matches the end of a line.

\A Matches the beginning of the string.

\z Matches the end of the string.

\Z Matches the end of the string unless the string ends with a "\n", in which case it matches just before the "\n".

So, use \A and lowercase \z. If you use \Z someone could sneak in a newline character. This is not dangerous I think, but might screw up algorithms that assume that there's no whitespace in the string. Depending on your regex and string-length constraints someone could use an invisible name with just a newline character.

JavaScript's implementation of Regex treats \A as a literal 'A' (ref). So watch yourself out there and test.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Ragmaanir
  • 2,671
  • 1
  • 22
  • 17
24

Difference By Example

  1. /^foo$/ matches any of the following, /\Afoo\z/ does not:
whatever1
foo
whatever2
foo
whatever2
whatever1
foo
  1. /^foo$/ and /\Afoo\z/ all match the following:
foo
shivam
  • 16,048
  • 3
  • 56
  • 71
Chun Yang
  • 2,451
  • 23
  • 16
20

The start and end of a string may not necessarily be the same thing as the start and end of a line. Imagine if you used the following as your test string:

my
name
is
Andrew

Notice that the string has many lines in it - the ^ and $ characters allow you to match the beginning and end of those lines (basically treating the \n character as a delimeter) while \A and \Z allow you to match the beginning and end of the entire string.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635