0

In JavaScript with a regexp I must extract from a long string of text the text contained between two strings "---ST---" and "---EN---", so for example, my text string is:

---ST---blah blah blah---EN--- other text ---ST--- foo bar baz ---EN--- other other text ---ST---the cat is on the table---EN---

and I must get for every ---ST---/---EN--- couple found an object like this:

[{textFound:"blah blah blah", startsAt:0, endsAt:22},
{textFound:" foo bar baz ", startsAt:42, endsAt:64},
...]

I tried the following but it doesn't work:

function getSTEN(input){

var r =[];
var expression = /---ST---(.*?)---EN---/gi;
var matches = input.match(expression);
for(match in matches)
    {
        var result = {};
        result['textFound'] = matches[match];
        result['startsAt'] = input.indexOf(matches[match]);
        //...
     };

     return r;
};
var str = "---ST---blah blah blah---EN--- other text ---ST--- foo bar baz ---EN--- other other text ---ST---the cat is on the table---EN---";
console.log(getSTEN(str));

Can you help me?

4 Answers4

1

You can use the following code to collect the data you need (I guess endsAt is the sum of m.index + captured string length + 8 (the length of ---ST---)):

function getSTEN(str) {
  
  var r = [];
  var re = /-{3}ST-{3}(.*?)-{3}EN-{3}/g; 
  var m;
 
  while ((m = re.exec(str)) !== null) {
     var result = {};   
     result['textFound'] = m[1];
     result['startsAt'] = m.index;
     result['endsAt'] = m.index + m[1].length + 8;
     r.push(result);
  }
  return r;
}

var str = "---ST---blah blah blah---EN--- other text ---ST--- foo bar baz ---EN--- other other text ---ST---the cat is on the table---EN---";

var rs = getSTEN(str);

document.getElementById("res").innerHTML = "[";
for (i = 0; i < rs.length; i++) {
  document.getElementById("res").innerHTML += "{textFound:\"" + rs[i]['textFound'] + "\", startsAt:" + rs[i]['startsAt'] + ", endsAt:" + rs[i]['endsAt'] + "}";
  if (i < rs.length-1)
     document.getElementById("res").innerHTML += ",";
}
document.getElementById("res").innerHTML += "]";
<div id="res" />
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

your function is actually ok. I'd change just a bit to it. Assuming, you really want only the string between st-en then your regex would stay as it is, but the match won't help us really much. You need exec()

var matches = expression.exec(input);

then you can foreach your matches. Answer actually found here on stackoverflow https://stackoverflow.com/a/432503/2582496

Community
  • 1
  • 1
jPO
  • 2,502
  • 1
  • 17
  • 26
  • This doesn't work properly, I get mixed data... For example: I should NOT get "jumps": the couple can occurr many times but I should not a match between three "---ST--- text ---EN--- text ... ---EN---" (this should not happen, only one couple at a time) –  Apr 20 '15 at 13:25
  • Ah I see. Then you have to change your regex a bit. Let me help. `expression = /---ST---(.*?)(?!---ST---)---EN---/gi;` try that and let me know, if I should put it inside answer ;-) – jPO Apr 20 '15 at 13:35
0

I think your problem is that you use 'matches[match]' instead of 'match' in your for each.

If you are confused about how to use a for each this explains it pretty well: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for_each...in.

deme72
  • 1,113
  • 1
  • 10
  • 13
-1

You don't need regex.

try this: jsfiddle.net/marcelortega/nnko5ebf/

EDIT: Here is new fiddle

marcel
  • 2,967
  • 1
  • 16
  • 25
  • marcel, here you find (I suppose) one occurrence, not any occurrence of that couple –  Apr 20 '15 at 13:19