38

I'm looking for a regular expression to remove a single parameter from a query string, and I want to do it in a single regular expression if possible.

Say I want to remove the foo parameter. Right now I use this:

/&?foo\=[^&]+/

That works as long as foo is not the first parameter in the query string. If it is, then my new query string starts with an ampersand. (For example, "foo=123&bar=456" gives a result of "&bar=456".) Right now, I'm just checking after the regex if the query string starts with ampersand, and chopping it off if it does.

Example edge cases:

Input                    |  Expected Output
-------------------------+--------------------
foo=123                  |  (empty string)
foo=123&bar=456          |  bar=456
bar=456&foo=123          |  bar=456
abc=789&foo=123&bar=456  |  abc=789&bar=456

Edit

OK as pointed out in comments there are there are way more edge cases than I originally considered. I got the following regex to work with all of them:

/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/

This is modified from Mark Byers's answer, which is why I'm accepting that one, but Roger Pate's input helped a lot too.

Here is the full suite of test cases I'm using, and a Javascript snippet which tests them:

$(function() {
    var regex = /&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/;
    
    var escapeHtml = function (str) {
        var map = {
          '&': '&',
          '<': '&lt;',
          '>': '&gt;',
          '"': '&quot;',
          "'": '&#039;'
        };
        
        return str.replace(/[&<>"']/g, function(m) { return map[m]; });
    };

    
    //test cases
    var tests = [
        'foo'     , 'foo&bar=456'     , 'bar=456&foo'     , 'abc=789&foo&bar=456'
       ,'foo='    , 'foo=&bar=456'    , 'bar=456&foo='    , 'abc=789&foo=&bar=456'
       ,'foo=123' , 'foo=123&bar=456' , 'bar=456&foo=123' , 'abc=789&foo=123&bar=456'
       ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
       ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
       ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
       ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
       ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
       ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
    ];
    
    //expected results
    var expected = [
        ''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
       ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
       ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
       ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
       ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
       ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
    ];
    
    for(var i = 0; i < tests.length; i++) {
        var output = tests[i].replace(regex, '');
        var success = (output == expected[i]);
        
        $('#output').append(
            '<tr class="' + (success ? 'passed' : 'failed') + '">'
            + '<td>' + (success ? 'PASS' : 'FAIL') + '</td>'
            + '<td>' + escapeHtml(tests[i]) + '</td>'
            + '<td>' + escapeHtml(output) + '</td>'
            + '<td>' + escapeHtml(expected[i]) + '</td>'
            + '</tr>'
        );
    }
    
});
#output {
    border-collapse: collapse;
    
}
#output tr.passed { background-color: #af8; }
#output tr.failed { background-color: #fc8; }
#output td, #output th {
    border: 1px solid black;
    padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table id="output">
    <tr>
        <th>Succ?</th>
        <th>Input</th>
        <th>Output</th>
        <th>Expected</th>
    </tr>
</table>
Community
  • 1
  • 1
Kip
  • 107,154
  • 87
  • 232
  • 265
  • Additional edge cases: `oopsfoo=123`, `foo`, `foo=`---all being the only, first, last, and middle parameter. (so 12 total here) –  Dec 03 '09 at 21:03
  • @Roger Pate: thanks, didn't think about that. also `foobar=123`, `foobar`, and `foobar=`, to ensure that the check for `foo` doesn't hit them – Kip Dec 03 '09 at 21:11
  • What is the expected output if the input is `foo=`? – Mark Byers Dec 03 '09 at 21:19
  • @Mark Byers: empty string. I'm going to put a more complete sample output up in a few minutes, when i get my test script presentable... – Kip Dec 03 '09 at 21:27
  • 1
    Thanks, the java version seems to be: String regex = "&"+paramToRemove+"(\\=[^&]*)?(?=&|$)|^"+paramToRemove+"(\\=[^&]*)?(&|$)"; – Sebastien Lorber Nov 27 '12 at 10:14

9 Answers9

29

If you want to do this in just one regular expression, you could do this:

/&foo(=[^&]*)?|^foo(=[^&]*)?&?/

This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.

To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    Why is both not valid? Input: `?blah&foo=abc&blah` –  Dec 03 '09 at 20:44
  • 1
    @Roger Pate: both is valid input, but you only want to match exactly one of them (because i'm replacing whatever is matched with empty string) – Kip Dec 03 '09 at 21:12
  • Try running this pattern against Roger's test cases. – Greg Bacon Dec 03 '09 at 22:16
  • 5
    Accepted this because the solution I got working to all my test cases (see edit to my question) was modified version of this idea: `/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/` – Kip Dec 03 '09 at 22:20
  • gbacon: the only cases it failed on were those containing 'foo' without a value. I've updated the regex to handle this, and it passes all cases now. – Mark Byers Dec 03 '09 at 22:30
  • @MarkByers: this will change something like `foobar=123` to `bar=123`. You need the non-matching `(?=&|$)` at the end of the left half, and `(&|$)` at the end of the right half. – Kip Dec 04 '09 at 01:14
  • Then this is a solution for this question?? – Mirko Cianfarani Apr 24 '13 at 19:14
7
/(?<=&|\?)foo(=[^&]*)?(&|$)/

Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \? to ^ if you've already stripped off the question mark from the query string.

Regex is still not a substitute for a real parser of the query string, however.

Update: Test script: (run it at codepad.org)

import re

regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"

cases = {
  "foo=123": "",
  "foo=123&bar=456": "bar=456",
  "bar=456&foo=123": "bar=456",
  "abc=789&foo=123&bar=456": "abc=789&bar=456",

  "oopsfoo=123": "oopsfoo=123",
  "oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
  "bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
  "abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",

  "foo": "",
  "foo&bar=456": "bar=456",
  "bar=456&foo": "bar=456",
  "abc=789&foo&bar=456": "abc=789&bar=456",

  "foo=": "",
  "foo=&bar=456": "bar=456",
  "bar=456&foo=": "bar=456",
  "abc=789&foo=&bar=456": "abc=789&bar=456",
}

failures = 0
for input, expected in cases.items():
  got = re.sub(regex, "", input)
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"

It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P


The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ? with a &.

This gives /&foo(=[^&]*)?(?=&|$)/, which is very straight forward and the best you're going to get. Remove the leading & in the final result (or change it back into a ?, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:

failures = 0
for input, expected in cases.items():
  input = "&" + input
  got = re.sub(regex, "", input)
  if got[:1] == "&":
    got = got[1:]
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"
  • having some problems with this one, but i'm working on it. yes, there is no \?, my string is only the query string – Kip Dec 03 '09 at 21:16
  • fails for these inputs: `bar=456&foo`, `bar=456&foo=`, `bar=456&foo=123` – Kip Dec 03 '09 at 21:51
  • Yes, I know, that's why I said my approach fails. :) –  Dec 03 '09 at 21:59
  • 1
    +1 for providing test code. Even though your solution didn't quite work, the test code is useful. – Mark Byers Dec 03 '09 at 22:03
4

Having a query string that starts with & is harmless--why not leave it that way? In any case, I suggest that you search for the trailing ampersand and use \b to match the beginning of foo w/o taking in a previous character:

 /\bfoo\=[^&]+&?/
JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
  • Using a trailing ampersand will give a problem with the third example. – catchmeifyoutry Dec 03 '09 at 20:39
  • Note that the trailing ampersand is optional in the regex that I gave. – JSBձոգչ Dec 03 '09 at 20:42
  • 1
    yeah i thought about leaving the extra &, but it looked a little sloppy to me. This regex will leave a trailing ampersand on the result. i.e. `\bfoo\=[^&]+&?` -> `bar=456&`. to get it to work with `foo` or `foo=`, and not with `xfoo` or `foox`, I modified it to this: `/\bfoo(\=[^&]*)?(&|$)/` – Kip Dec 03 '09 at 22:07
1

I based myself on your implementation to get a Java impl that seems to work:

  public static String removeParameterFromQueryString(String queryString,String paramToRemove) {
    Preconditions.checkArgument(queryString != null,"Empty querystring");
    Preconditions.checkArgument(paramToRemove != null,"Empty param");
    String oneParam = "^"+paramToRemove+"(=[^&]*)$";
    String begin = "^"+paramToRemove+"(=[^&]*)(&?)";
    String end = "&"+paramToRemove+"(=[^&]*)$";
    String middle = "(?<=[&])"+paramToRemove+"(=[^&]*)&";
    String removedMiddleParams = queryString.replaceAll(middle,"");
    String removedBeginParams = removedMiddleParams.replaceAll(begin,"");
    String removedEndParams = removedBeginParams.replaceAll(end,"");
    return removedEndParams.replaceAll(oneParam,"");
  }

I had troubles in some cases with your implementation because sometimes it did not delete a &, and did it with multiple steps which seems easier to understand.

I had a problem with your version, particularly when a param was in the query string multiple times (like param1=toto&param2=xxx&param1=YYY&param3=ZZZ&param1....)

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
1

It's a bit silly but I started trying to solve this with a regexp and wanted to finally get it working :)

$str[] = 'foo=123';
$str[] = 'foo=123&bar=456';
$str[] = 'bar=456&foo=123';
$str[] = 'abc=789&foo=123&bar=456';

foreach ($str as $string) {
    echo preg_replace('#(?:^|\b)(&?)foo=[^&]+(&?)#e', "'$1'=='&' && '$2'=='&' ? '&' : ''", $string), "\n";
}

the replace part is messed up because apparently it gets confused if the captured characters are '&'s

Also, it doesn't match afoo and the like.

Matteo Riva
  • 24,728
  • 12
  • 72
  • 104
1

Thanks. Yes it uses backslashes for escaping, and you're right, I don't need the /'s.

This seems to work, though it doesn't do it in one line as requested in the original question.

    public static string RemoveQueryStringParameter(string url, string keyToRemove)
    {
        //if first parameter, leave ?, take away trailing &
        string pattern = @"\?" + keyToRemove + "[^&]*&?"; 
        url = Regex.Replace(url, pattern, "?");
        //if subsequent parameter, take away leading &
        pattern = "&" + keyToRemove + "[^&]*"; 
        url =  Regex.Replace(url, pattern, "");
        return url;
    }
Adeel
  • 11
  • 1
1

it's never too late right

did the thing using conditional lookbehind to be sure it doesn't mess up &s

/(?(?<=\?)(foo=[^&]+)&*|&(?1))/g

if ? is behind we catch foo=bar and trailing & if it exists

if not ? is behind we catch &foo=bar

(?1) represents 1st cathing group, in this example it's the same as (foo=[^&]+)


actually i needed a oneliner for two similar parameters page and per-page

so i altered this expression a bit

/(?(?<=\?)((per-)?page=[^&]+)&*|&(?1))/g

works like charm

0

You can use the following regex:

[\?|&](?<name>.*?)=[^&]*&?

If you want to do exact match you can replace (?<name>.*?) with a url parameter. e.g.:

[\?|&]foo=[^&]*&?

to match any variable like foo=xxxx in any URL.

interjay
  • 107,303
  • 21
  • 270
  • 254
Sujit Rai
  • 445
  • 6
  • 10
-2

For anyone interested in replacing GET request parameters:

The following regex works for also for more general GET method queries (starting with ?) where the marked answer fails if the parameter to be removed is the first one (after ?)

This (JS flavour) regex can be used to remove the parameter regardless of position (first, last, or in between) leaving the query in well formated state.

So just use a regex replace with an empty string.

/&s=[^&]*()|\?s=[^&]*$|s=[^&]*&/

Basically it matches one of the three cases mentioned above (hence the 2 pipes)

  • in the original question, the input to the regex is only the query string (i.e. everything after the `?`), not the whole url, so there is no `?` in the string. that's why the accepted answer doesn't consider that scenario. – Kip Oct 20 '15 at 11:52
  • that's correct. however I don't see how this does not qualify or even more I don't see any reason for downvoting (!??) as this answer addresses a quite common, more general scenario. I have updated the answer with remarks. – Ion Andrei Bara Oct 20 '15 at 15:17
  • I wasn't the one who gave the downvote. But the reason why someone else *might* have is that, in your original answer, you were answering a different question from what was asked and saying the accepted answer was wrong because it didn't answer that question. – Kip Oct 20 '15 at 20:04
  • also, your answer fails to remove the parameter in several of the edge cases outlines in the original post: `foo`, `foo&bar=456`, `bar=456&foo`, `abc=789&foo&bar=456`, `foo=`, `foo=123`, `xfoo=&bar=456`, `abc=789&xfoo=&bar=456`, `xfoo=123&bar=456`, `abc=789&xfoo=123&bar=456` – Kip Oct 20 '15 at 20:08
  • Here is a jsFiddle showing the answer (from the OP): https://jsfiddle.net/1b6ukaw9/ Here is a jsFiddle showing the cases where your regex fails: https://jsfiddle.net/o0b2rrkd/ This regex works for your case in perl/php: `/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)|(?<=\?)foo(\=[^&]*)?(&|$)/`. But it doesn't work in Javascript because it doesn't support look-behind assertions. Here is a version which does work in Javascript, but I had to change the replace code as well: https://jsfiddle.net/ba7m8wz8/ – Kip Oct 20 '15 at 21:19