I'm trying to catch as many Javascript redirects as possible from many HTML pages. My regular expression is:
((location.href)|(window.location)|(location.replace)|(location.assign))(( ?= ?)|( ?\( ?))("|')([^'"]*)("|')( ?\) ?)?;
I use Python but the question is general:
regex = re.compile(r"""((location.href)|(window.location)|(location.replace)|(location.assign))(( ?= ?)|( ?\( ?))("|')([^'"]*)("|')( ?\) ?)?;""", re.I)
# ... some control here ...
print re.search(regex, html).group(10) # 10 is the pure url
I did some tests and I was able to catch all these cases.
location.href = "http://www.foo.com";
location.href="http://www.foo.com";
window.location = "http://www.foo.com";
window.location.href = "http://www.foo.com";
location.replace ("http://www.foo.com");
location.replace( "http://www.foo.com" ) ;
location.assign ("http://www.foo.com");
And skip where I can't resolve an URL because the code contains a variable:
location.href = "http://www.foo.com" + var + "something else";
The questions are:
- Are there other ways to redirect using Javascript? Other
location.somethingelse
that I am missing? - Is the way I catch these 4 cases correct? Is it allowed to have something like
location.href = http://www.foo.com;
orlocation.replace (http://www.foo.com);
that I'll miss because of the (double) quotes? Am I too strict or too lax? - Is my regex well written? Or can I improve it in some way?