2

I am trying to write a regular expression to be used in a Google Analytics goal that will match URLs containing

?package=whatever

and also

/success

The user will first visit a page like

www.website.com/become-client/?package=greatpackage

and if they purchase they will be lead to this page

www.website.com/become-client/?package=greatpackage/success

So based on this I could use the following regex

\?package\=greatpackage/success

This should match the correct destination and I would be able to use this in the goal settings in Analytics to create a goal for purchases of the greatpackage package.

But sometimes the website will use other parameters in addition to ?package. Like ?type, ?media and so on.

?type=business

Resulting in URLs like this

www.website.com/become-client/?package=greatpackage?type=business

and if they purchase they will be lead to this page

www.website.com/become-client/?package=greatpackage?type=business/success

Now the /success part is moved away from the ?package part. My questions is how do I write a regex that will still match this URL no matter what other parameters there may be in between the parts?

---update----

@jonarz proposed the following and it works like a charm.

\?package\=greatpackage(.*?)/success

But what if there are two products with nearly the same name. For example greatpackage and greatpackageULTRA. The code above will select both. If changing the product names is impossible, how can I then select only one of them?

deltitnu
  • 83
  • 1
  • 6
  • 1
    Depending on whether you would like to catch the groups or not, it would be: `(\?package\=greatpackage)(.*?)(\/success)` or `\?package\=greatpackage(.*?)\/success`. – Jonasz Dec 11 '15 at 09:34
  • @Jonarz The `.*?` part is matching anychar between 0 and unlimited times optionnaly ? The 0 times already make it optionnal didn't it ? – naurel Dec 11 '15 at 10:34
  • @naurel The ? in this expression is a lazy quantifier. If the question mark was not there, the .* would match anything after "?package=greatpackage". With the question mark, it stops on the "/success" and if "/success" is not found, it does not match (tested with regexr.com). [Here](http://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) you will find more information. – Jonasz Dec 11 '15 at 10:41
  • I didn't knew about lay and gready quantifier. Thank you but making it lazy here would make sens only if there is multiples `/success` after in the string. Since the regex will try to match literaly `/success` as you specified it : [With the lazy](https://regex101.com/r/aN9lD9/2). [Without the lazy](https://regex101.com/r/aN9lD9/3). – naurel Dec 11 '15 at 11:02
  • @Jonarz Thanks! It works perfectly. But what if there is also a product called greatpackageULTRA and I want to differentiate between these two? (see updated question above) – deltitnu Dec 13 '15 at 17:34
  • @deltitnu I'm glad it works. I've added the response regarding the edited question. – Jonasz Dec 13 '15 at 19:53

2 Answers2

2

The regex that would solve the problem introduced in the edit, would be:

\?package\=greatpackage((\?|\/)(.*?))?\/success(\/|\b)

Here is a test: https://regex101.com/r/jS4cH5/1 and it seems to suit your needs.

Jonasz
  • 1,617
  • 1
  • 13
  • 19
0

If you want to match an url like this one :

www.website.com/become-client/?package=greatpackage?type=business?other=nada/success

With a group to extract your package type :

.*\?package=([^\/?]+).*\/success

Without group (just matching the url if it's containing package=greatpackage and success)

.*\?package=greatpackage.*\/success

Without group and matching for any package type :

.*\?package=[^\/?]+.*\/success

You just need to add .* to match any char (except new lines). The [^/?]* part is there to be sure your package type isn't empty (ie : the first char isn't a / nor ?).

naurel
  • 625
  • 4
  • 18