4

I think the title says it all. I'm trying to get groups and concatenate them together.

I have this text:

GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48

And I want this output:

IDENTIFIER 10.802.123/3843-48

So I want to explicitly say, I want to capture one group before this word and after, then concatenate both, only using regex. Is this possible?

I can already extract the 48 like this:

var text = GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48
var reg = new RegExp('IDENTIFIER' + '.*?(\\d\\S*)', 'i');
var match = reg.exec(text);

Output:

48

Can it be done?

I'm offering 200 points.

Michael Laszlo
  • 12,009
  • 2
  • 29
  • 47

5 Answers5

3

You can do:

var text = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';
var match = /GPX\s+(.+?) \d .*?(IDENTIFIER).*?(\d\S*)/i.exec(text);

var output = match[2] + ' ' + match[1] + '-' + match[3];
//=> "IDENTIFIER 10.802.123/3843­-48"
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

You must precisely define the groups that you want to extract before and after the word. If you define the group before the word as four or more non-whitespace characters, and the group after the word as one or more non-whitespace characters, you can use the following regular expression.

var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'i');
var groups = re.exec(text);
if (groups !== null) {
   var result = groups[1] + groups[2];
}

Let me break down the regular expression. Note that we have to escape the backslashes because we're writing a regular expression inside a string.

  • (\\S{4,}) captures a group of four or more non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • (?: indicates the start of a non-capturing group
  • \\S{1,3} matches one to three non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • )*? makes the non-capturing group match zero or more times, as few times as possible
  • word matches whatever was in the variable word when the regular expression was compiled
  • .*? matches any character zero or more times, as few times as possible
  • (\\S+) captures one or more non-whitespace characters
  • the 'i' flag makes this a case-insensitive regular expression

Observe that our use of the ? modifier allows us to capture the nearest groups before and after the word.

You can match the regular expression globally in the text by adding the g flag. The snippet below demonstrates how to extract all matches.

function forward_and_backward(word, text) {
  var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'ig');
  // Find all matches and make an array of results.
  var results = [];
  while (true) {
    var groups = re.exec(text);
    if (groups === null) {
      return results;
    }
    var result = groups[1] + groups[2];
    results.push(result);
  }
}

var sampleText = "  GPX 10.802.123/3843- 1 -- IDENTIFIER 48   A BC 444.2345.1.1/99x 28 - - Identifier 580 X Y Z 9.22.16.1043/73+ 0  ***  identifier 6800";

results = forward_and_backward('IDENTIFIER', sampleText);
for (var i = 0; i < results.length; ++i) { 
  document.write('result ' + i + ': "' + results[i] + '"<br><br>');
}
body {
  font-family: monospace;
}
Michael Laszlo
  • 12,009
  • 2
  • 29
  • 47
  • Do you have interest in won 100 points? -> http://stackoverflow.com/questions/32361120/sum-two-capture-groups –  Sep 02 '15 at 19:02
  • Michael, i need help in other question http://stackoverflow.com/questions/32460978/4-capture-groups-in-javascript-regex i only have 100 points now, i can give you 50. –  Sep 08 '15 at 15:13
1

This would be possible through replace function.

var s = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48'
s.replace(/.*?(\S+)\s+\d+\s*-\s*(IDENTIFIER)\s*(\d+).*/, "$2 $1-$3")
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1
^\s*\S+\s*\b(\d+(?:[./]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$

You can try this.Replace by $2 $1-$3.See demo.

https://regex101.com/r/sS2dM8/38

var re = /^\s*\S+\s*\b(\d+(?:[.\/]\d+)+)\b.*?-.*?\b(\S+)\b\s*(\d+)\s*$/gm; 
var str = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';
var subst = '$2 $1-$3'; 

var result = str.replace(re, subst);
vks
  • 67,027
  • 10
  • 91
  • 124
0

You can use split too:

var text = 'GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48';

var parts = text.split(/\s+/);

if (parts[4] == 'IDENTIFIER') {
    var result = parts[4] + ' ' + parts[1] + '-' + parts[5];
    console.log(result);
} 
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125