Backward capture group concatenated with forward capture group

Question

I think the title says it all. I'm trying to get groups and concatenate them together.

I have this text:

GPX 10.802.123/3843 1 - IDENTIFIER 48

And I want this output:

IDENTIFIER 10.802.123/3843-48

So I want to explicitly say, I want to capture one group before this word and after, then concatenate both, only using regex. Is this possible?

I can already extract the 48 like this:

var text = GPX 10.802.123/3843 1 - IDENTIFIER 48
var reg = new RegExp('IDENTIFIER' + '.*?(\\d\\S*)', 'i');
var match = reg.exec(text);

Output:

Can it be done?

I'm offering 200 points.

You can't do like this.. – Avinash Raj Aug 31 '15 at 16:02 — Avinash Raj, Aug 31 '15 at 16:02
there is any way that i can solve this? – Aug 31 '15 at 16:05 — , Aug 31 '15 at 16:05

score 3 · Answer 1 · answered Aug 31 '15 at 16:09

3

You can do:

var text = 'GPX 10.802.123/3843 1 - IDENTIFIER 48';
var match = /GPX\s+(.+?) \d .*?(IDENTIFIER).*?(\d\S*)/i.exec(text);

var output = match[2] + ' ' + match[1] + '-' + match[3];
//=> "IDENTIFIER 10.802.123/3843-48"

answered Aug 31 '15 at 16:09

anubhava

761,203
64
569
643

score 3 · Accepted Answer · answered Aug 31 '15 at 23:05

You must precisely define the groups that you want to extract before and after the word. If you define the group before the word as four or more non-whitespace characters, and the group after the word as one or more non-whitespace characters, you can use the following regular expression.

var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'i');
var groups = re.exec(text);
if (groups !== null) {
   var result = groups[1] + groups[2];
}

Let me break down the regular expression. Note that we have to escape the backslashes because we're writing a regular expression inside a string.

(\\S{4,}) captures a group of four or more non-whitespace characters
\\s+ matches one or more whitespace characters
(?: indicates the start of a non-capturing group
\\S{1,3} matches one to three non-whitespace characters
\\s+ matches one or more whitespace characters
)*? makes the non-capturing group match zero or more times, as few times as possible
word matches whatever was in the variable word when the regular expression was compiled
.*? matches any character zero or more times, as few times as possible
(\\S+) captures one or more non-whitespace characters
the 'i' flag makes this a case-insensitive regular expression

Observe that our use of the ? modifier allows us to capture the nearest groups before and after the word.

You can match the regular expression globally in the text by adding the g flag. The snippet below demonstrates how to extract all matches.

function forward_and_backward(word, text) {
  var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'ig');
  // Find all matches and make an array of results.
  var results = [];
  while (true) {
    var groups = re.exec(text);
    if (groups === null) {
      return results;
    }
    var result = groups[1] + groups[2];
    results.push(result);
  }
}

var sampleText = "  GPX 10.802.123/3843- 1 -- IDENTIFIER 48   A BC 444.2345.1.1/99x 28 - - Identifier 580 X Y Z 9.22.16.1043/73+ 0  ***  identifier 6800";

results = forward_and_backward('IDENTIFIER', sampleText);
for (var i = 0; i < results.length; ++i) { 
  document.write('result ' + i + ': "' + results[i] + '"<br><br>');
}

body {
  font-family: monospace;
}

Do you have interest in won 100 points? -> http://stackoverflow.com/questions/32361120/sum-two-capture-groups — , Sep 02 '15 at 19:02
Michael, i need help in other question http://stackoverflow.com/questions/32460978/4-capture-groups-in-javascript-regex i only have 100 points now, i can give you 50. — , Sep 08 '15 at 15:13

score 1 · Answer 3 · answered Aug 31 '15 at 16:11

1

This would be possible through replace function.

var s = 'GPX 10.802.123/3843 1 - IDENTIFIER 48'
s.replace(/.*?(\S+)\s+\d+\s*-\s*(IDENTIFIER)\s*(\d+).*/, "$2 $1-$3")

answered Aug 31 '15 at 16:11

Avinash Raj

172,303
28
230
274

score 1 · Answer 4 · answered Sep 02 '15 at 17:55

^\s*\S+\s*\b(\d+(?:[./]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$

You can try this.Replace by $2 $1-$3.See demo.

https://regex101.com/r/sS2dM8/38

var re = /^\s*\S+\s*\b(\d+(?:[.\/]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$/gm; 
var str = 'GPX 10.802.123/3843 1 - IDENTIFIER 48';
var subst = '$2 $1-$3'; 

var result = str.replace(re, subst);

score 0 · Answer 5 · answered Sep 03 '15 at 18:33

0

You can use split too:

var text = 'GPX 10.802.123/3843 1 - IDENTIFIER 48';

var parts = text.split(/\s+/);

if (parts[4] == 'IDENTIFIER') {
    var result = parts[4] + ' ' + parts[1] + '-' + parts[5];
    console.log(result);
}

answered Sep 03 '15 at 18:33

Casimir et Hippolyte

88,009
5
94
125

Backward capture group concatenated with forward capture group

5 Answers5