4

The current expression validates a web address (HTTP), how do I change it so that an empty string also matches?

(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?
Mark Biek
  • 146,731
  • 54
  • 156
  • 201
Peter Morris
  • 20,174
  • 9
  • 81
  • 146
  • It didn't occur to me from your question that you were matching lines in a text file... I thought you were likely parsing the html of an http-response for links within and couldn't figure out the context of your 'empty string' goal until I read the answer you selected. Think different, eh? – Hardryv Nov 11 '11 at 13:22
  • in case it's helpful to anyone browsing in as I did, the best match string I've architected for URLs buried within HTML is "((http)s?:\/\/)([\w\.\-_]*(\/)?)*(#[\w\.\-_])?" -- I tested it against multiple popular sites with many links each, and it will also encompass the end-of-URL page-class-search tag – Hardryv Nov 11 '11 at 14:14

4 Answers4

7

If you want to modify the expression to match either an entirely empty string or a full URL, you will need to use the anchor metacharacters ^ and $ (which match the beginning and end of a line respectively).

^(|https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)$

As dirkgently pointed out, you can simplify your match for the protocol a little, so I've included that for you too.

Though, if you are using this expression from within a program or script, it may be simpler for you to use the languages own means of checking if the input is empty.

// in no particular language...
if input.length > 0 then
    if input matches <regex> then
        input is a URL
    else
        input is invalid
else
    input is empty
Alex Barrett
  • 16,175
  • 3
  • 52
  • 51
  • Accepted as the answer because you were the only person to mention the ^ and $ required, without which simply adding the ? made any pattern match. Thanks! – Peter Morris Feb 27 '09 at 04:49
2

Put the whole expression in parenthesis and mark it as optional (“?” quantifier, no or one repetition)

((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)?
Peter Morris
  • 20,174
  • 9
  • 81
  • 146
Gumbo
  • 643,351
  • 109
  • 780
  • 844
1

Use expression markers ^$ around your expression and add |^$ to the end. This way you're using the | or operator with two expressions showing that you have two different match cases.

^(https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)$|^$

The key here is that |^$ means "or match blank".

Also, that expression with only work in javascript if you use a template string.

Austin Poulson
  • 685
  • 7
  • 22
0

Expr? where Expr is your URL matcher. Just like I would for http and https: https?. The ? is a known as a Quantifier -- you can look it up. From Wikipedia:

? The question mark indicates there is zero or one of the preceding element.

dirkgently
  • 108,024
  • 16
  • 131
  • 187