Depending on the url encoding's specifications, this task may be impossible to accomplish unambiguously. In order for this to be possible, the urls in the dataset must be standardized such that every parameter has an equal sign after it, and there must be no other stray equal signs in the parameter values. If both of these conditions are true, then the following will work:
The regular expressions
&(firstName|lastName|email|phone1|address)=([^&]*(?:&[^&=]+(?=&|$))*)
Also note this regular expression does not cover cases where one of the desired parameters is the first parameter. Because Javascript regex is limited, and this is a special case anyway (beginning with ?
instead of &
), this will need to be handled differently, depending on what you want to do with the parameters. Matching the following and replacing with ?
is a way to remove the parameter:
\?(firstName|lastName|email|phone1|address)=([^&]*(?:&[^&=]+(?=&|$))*)(?:&|$)
If you aren't planning on completely removing the parameter, the (?:&|$)
at the end of the expression can be removed for simplicity.
Depending on what you plan on replacing the parameters with, you may find it useful to tweak the expressions, but these should generally give the desired output within the above rules.
How it works
The trick here is to have a separate non-capturing group (?:&[^&=]+(?=&|$))*
that handles additional parts of the parameter string with raw ampersands but no equal sign. The character class [^&=]+
ensures that the subexpression doesn't have ampersands or equal signs, and the lookahead (?=&|$)
ensures that the string is followed by another parameter or the end of the string, not an equal sign. The whole group has a quantifier *
, since it can appear zero, one, or multiple times after the initial parameter.
Also note for convenience, the values for the parameter name and value are stored in capturing groups 1 and 2, for easy access and parsing. If you aren't planning on using the values, they can be replaced with non-capturing groups by adding a ?:
after the (
.
Disclaimer
If any parameters are missing the equal sign, there's no way to unambiguously disambiguate new url parameters from values for the previous url parameter, since in the example https://example.com?&iframeLoad=true&email=abc&xyz@.com
, this could either be referring to one parameter named email
with the value abc&xyz@.com
, or two parameters named email
and xyz@.com
(unless both the list of parameter strings and the list of value strings are standardized, but down this road lies madness). In a similar way, random equal signs trick the parser. As @David Faber mentioned, typically a &
character in a URL would be URL-encoded as %26
, to prevent this ambiguity entirely.