1

I have a regular expression that captures three backreferences though one (the 2nd) may be null.

Given the flowing string:

http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajonathonoat.es&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fjonathonoat.es%2Fbritish-mozcast%2F&ei=MQj9UKejDYeS0QWruIHgDA&usg=AFQjCNHy1cDoWlIAwyj76wjiM6f2Rpd74w&bvm=bv.41248874,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1

I wish to capture the TLD (in this case .co.uk), q param and cd param.

I'm using the following RegEx:

/.*\.google([a-z\.]*).*q=(.*[^&])?.*cd=(\d*).*/i

Which works except the 2nd backreference includes the other parameters upto the cd param, I current get this:

["http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1 ", ".co.uk", "site%3Ajonathonoat.es&source=web", "1", index: 0, input: "http://www.google.co.uk/url?sa=t&rct=j&q=site%3Ajo…,d.d2k,.co.uk,site%3Ajonathonoat.es&source=web,1"]

The 1st backreference is correct, it's .co.uk and so is the 3rd; it's 1. I want the 2nd backreference to be either null (or undefined or whatever) or just the q param, in this example site%3Ajonathonoat.es. It currently includes the source param too (site%3Ajonathonoat.es&source=web).

Any help would be much appreciated, thanks!

I've added a JSFiddle of the code, look in your browser console for the output, thanks!

Jonathon Oates
  • 2,912
  • 3
  • 37
  • 60
  • I'd also consider parsing the url: http://stackoverflow.com/questions/736513/how-do-i-parse-a-url-into-hostname-and-path-in-javascript – Kobi Jan 21 '13 at 09:52

2 Answers2

1

if negating character classes, i always add a multiplier to the class itself:

/.*\.google([a-z\.]*).*q=([^&]*?)?.*cd=(\d*).*/i

i also recoomend not using * or + as they are "greedy", always use *? or +? when you are going to find delimiters inside your string. For more on greedyness check J.F.Friedls Mastering Rgeular Expressions or simply here

DesertEagle
  • 599
  • 5
  • 18
0

You want the middle group to be:

q=([^&]*)

This will capture characters other than ampersand. This also allows zero characters, so you can remove the optional group (?).

Working example: http://rubular.com/r/AJkXxgeX5K

Kobi
  • 135,331
  • 41
  • 252
  • 292