3

Statement: I am new to RegExp and trying to learn capture groups in javascripts

  1. I am using https://regex101.com/r/COYhIc/1 for testing
  2. see attached image for character pos column of each match by https://regex101.com

Objective:

  1. I want to print all matches and groups at console (Done)
  2. I want to print character position of each match [see image](remaining)

enter image description here

JSFIDDLE: https://jsfiddle.net/bababalcksheep/p28fmdk4/68/

JavaScript:

function parseQuery(query) {
  var isRE = query.match(/^\/(.*)\/([a-z]*)$/);
  if (isRE) {
    try {
      query = new RegExp(isRE[1], isRE[2]);
    } catch (e) {}
  }
  return query;
}
var str = $('#str').val();
var regex = parseQuery($('#reg').val());
//
var result;
var match_no = 0;
var output = '';
while ((result = regex.exec(str)) !== null) {
  match_no++;
  output += `\nMatch ${match_no}\n`;
  output += `Full Match, ${ result[0]} , Pos\n`;
  for (i = 1; i < result.length; i++) {
    output += `Group ${i}, ${ result[i]} , Pos\n`;
  }
}
console.log(output);
django
  • 2,809
  • 5
  • 47
  • 80
  • Similar question: https://stackoverflow.com/questions/15934353/get-index-of-each-capture-in-a-javascript-regex – Klesun Aug 23 '19 at 21:48

2 Answers2

1

According to docs RegExp.exec, you can retrieve it using index property. So I would add this line into your snippet to retrieve column position for your full match:

`${result.index}-${result.index + result[0].length}`

For subgroups, JS doesn't retrieve index, so a workaround can be achieved using indexOf:

const initialSubGroupIndex = str.indexOf(result[i], result.index);
`${initialSubGroupIndex}-${initialSubGroupIndex + result[i].length}`
guijob
  • 4,413
  • 3
  • 20
  • 39
  • 1
    but waht about group 1 and 2 ? i have issue there ,full match is correct but rest are worng – django Mar 05 '19 at 02:59
  • @django unfortunately, js doesn't offer accessing index of groups by default, what I would do in your situation is searching in your actual string `str` for your found groups result[1] and result[2]. something like `str.indexOf(result[1])` – guijob Mar 05 '19 at 03:06
  • use fiddle https://jsfiddle.net/bababalcksheep/p28fmdk4/68/ , i updated it to match extacly with https://regex101.com/r/COYhIc/1 and see the difference for group 2 – django Mar 05 '19 at 03:07
  • 1
    ```str.indexOf(result[1])``` can yield worng result as group 2 with value ```100``` is repeated twice – django Mar 05 '19 at 03:10
  • @django yup, that's a good opportunity to use [`indexOf`](https://developer.mozilla.org/pt-BR/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf)'s second argument: https://jsfiddle.net/tyje1zbc/ – guijob Mar 05 '19 at 03:12
  • @django Updated my answer ! – Tushar Gupta Mar 05 '19 at 03:23
  • None of the solutions can pass the test case str = `aaaaaaaaaaaaaa`, regex = `/a(a*)a/g`. It's simply not possible to find the index of the subgroup - you can only do it when the string captured appears only once in the match – nhahtdh Mar 05 '19 at 03:50
  • @guijob https://jsfiddle.net/bababalcksheep/p28fmdk4/90/ , ```Group 1``` pos should be ```1-13``` as per https://regex101.com but it is ```0-12 ``` – django Mar 05 '19 at 08:07
  • @django yeah, it doesn't fit in all situations but it does for your particular problem. Besides, I don't see generic solution for this instead of crafting your own regex interpreter or using a 3th party lib. – guijob Mar 05 '19 at 13:25
1

In your output field use index and lastIndex. exec returns an object with a index property.

output += `Full Match, ${ result[0]} , Pos ${result.index} - ${regex.lastIndex}\n `;

Update for the groups:

I have used a small logic to get the indices:

var m = new RegExp(result[i]);
output += `Group ${i}, ${ result[i]}, Pos ${$('#str').val().match(m).index} - ${regex.lastIndex} \n`;

function parseQuery(query) {
  var isRE = query.match(/^\/(.*)\/([a-z]*)$/);
  if (isRE) {
    try {
      query = new RegExp(isRE[1], isRE[2]);
    } catch (e) {}
  }
  return query;
}
var str = $('#str').val();
var regex = parseQuery($('#reg').val());
//
var result;
var match_no = 0;
var output = '';
while ((result = regex.exec(str)) !== null) {
  match_no++;
  output += `\nMatch ${match_no}\n`;
  output += `Full Match, ${ result[0]} , Pos ${result.index} - ${regex.lastIndex}\n `;
  for (i = 1; i < result.length; i++) {
    var m = new RegExp(result[i]);
    output += `Group ${i}, ${ result[i]}, Pos ${$('#str').val().match(m).index} - ${regex.lastIndex} \n`;
  }
}
console.log(output);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="container">
  <div class="form-group">
    <label for="str">String:</label>
    <input type="text" class="form-control" id="str" value="source=100, delta=2, source=2121, delta=5">
  </div>
  <div class="form-group">
    <label for="regex">Regex:</label>
    <input type="text" class="form-control" id="reg" value="/(source=(\d+))/g">
  </div>
  <div id="result">

  </div>
</div>

FIDDLE

Tushar Gupta
  • 15,504
  • 1
  • 29
  • 47
  • what about groups ? it is ok for full match but not for group 2 – django Mar 05 '19 at 03:02
  • use fiddle https://jsfiddle.net/bababalcksheep/p28fmdk4/68/ , i updated it to match extacly with https://regex101.com/r/COYhIc/1 and see the difference for group 2 – django Mar 05 '19 at 03:07
  • @django Updated the answer ! – Tushar Gupta Mar 05 '19 at 03:22
  • FYI I have used the old js fiddle, but it works fine. :) – Tushar Gupta Mar 05 '19 at 03:28
  • None of the solutions can pass the test case str = `aaaaaaaaaaaaaa`, regex = `/a(a*)a/g`. It's simply not possible to find the index of the subgroup - you can only do it when the string captured appears only once in the match – nhahtdh Mar 05 '19 at 03:51