0

I'm trying to match these kind of character sequences:

sender=11&receiver=2&subject=3&message=4
sender=AOFJOIA&receiver=p2308u48302rf0&subject=(@#UROJ)(J#OFN:&message=aoefhoa348!!!

Where the delimiters between (key, val) pair is the '&' character. I'd like to group them in a way I can get access to the key and the value of each pair.

I tried something like:

([[:alnum:]]+)=([[:alnum:]]+)

But then I miss the:

subject=(@#UROJ)(J#OFN:

I couldn't find a way to allow these type of characters to be accepted. To be more specific, if there are n pairs of key-value, I would like to have n matches, each consisting of 2 groups - 1 for the key, 1 for the value.

I'd be glad if you helped me out with this.

Thanks

johni
  • 5,342
  • 6
  • 42
  • 70
  • 4
    Why? Wouldn't it be easier to use a request parser? Not saying you don't have a legitimate need to do this, but... it's pretty rare. – Dave Newton Dec 11 '15 at 18:33
  • subject=(@#UROJ)(J#OFN: are basically special characters which `alnum` doesn't cover – Alvaro Silvino Dec 11 '15 at 18:33
  • RegEx cant parse html properly –  Dec 11 '15 at 18:37
  • Lets say I cannot use any parser but have to implement one. What would you do instead of using regex? – johni Dec 11 '15 at 18:42
  • 1
    http://stackoverflow.com/questions/13592236/parse-the-uri-string-into-name-value-collection-in-java – Rustam Dec 11 '15 at 18:43
  • Instead of trying to find an elaborate regex, why not do it in a few simple steps? Like `split("&")`, then `split("=", 2)`. You can even do it on one line, as of Java 8. – VGR Dec 11 '15 at 20:36

3 Answers3

1

https://regex101.com/r/hN7qG9/1

I guess that will solve your problem:

/([^?=&]+)(=([^&]*))?/ig

output:

sender=11
receiver=2
subject=3
message=4
sender=AOFJOIA
receiver=p2308u48302rf0
subject=(@#UROJ)(J#OFN:
message=aoefhoa348!!!

and you can acess each patter:

 $1 - first pattern (sender)
 $2 - second pattern (=11)
 $3 - second pattern without '='(11)

reference

var string = 'sender=11&receiver=2&subject=3&message=4'
var string2 = 'sender=AOFJOIA&receiver=p2308u48302rf0&subject=(@#UROJ)(J#OFN:&message=aoefhoa348!!!';

var regex = /([^?=&]+)(=([^&]*))?/ig;
var eachMatche = string.match(regex);

for (var i = 0; i < eachMatche.length; i++) {
  snippet.log(eachMatche[i]);
  snippet.log('First : '+eachMatche[i].replace(regex,'$1'));
  snippet.log('Second : '+eachMatche[i].replace(regex,'$3'));
}
var eachMatche = string2.match(regex);
for (var i = 0; i < eachMatche.length; i++) {
  snippet.log(eachMatche[i]);
  snippet.log('First : '+eachMatche[i].replace(regex,'$1'));
  snippet.log('Second : '+eachMatche[i].replace(regex,'$3'));
}
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
Alvaro Silvino
  • 9,441
  • 12
  • 52
  • 80
0

All the special characters in your example fall unter the "punctuation" group, see :

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

If that still isn't enough, you could try to make your own character regex class. Like [@# etc...] . Keep in mind that you will have to escape special java characters with an extra /.

Christian W
  • 1,396
  • 1
  • 12
  • 17
0
    String req = "sender=AOFJOIA&receiver=p2308u48302rf0&subject=(@#UROJ)(J#OFN:&message=aoefhoa348!!!";
    Pattern p = Pattern.compile("([\\w]+)=([^&]+)");
    Matcher m = p.matcher(req);

    while (m.find()){
        System.out.println("key = " + m.group(1)); // key
        System.out.println("value = " + m.group(2)); // value
    }

You should define your own character class for the "value" group of key/value pair. For instance, it could be [\w!"#$%'()*+,-./:;<=>?@[]^_`{|}~] or [\w@()#:!] or just as simple as the following: [^&]. I think [^&] character class is the most appropriate since you don't know all possible characters that can be in "value" part.

Dmitry JJ
  • 169
  • 2
  • 11