-1

So I am trying to make an array of every word in a text and the array should be like [word, startIndex, endIndex]. I am going to use this to replace words after, after checking the word-type and find a synonym for it to replace it with. But the problem I am facing is splitting each word and storing the start and end index. text.match(/\b(\w+)\b/g) works, but I do not get the start and end index that I need. I also tried making some function to parse the text, but it ended up overcomplicated and not really working like it should.

So i wondered if anybody in the javascript community here has a better solution or know how to make an easy function for it.

This is what I would like to happen.

Input:

Norway, officially the Kingdom of Norway, is a sovereign state and unitary monarchy whose territory comprises the western portion of the Scandinavian Peninsula

Output:

['Norway', 0, 6], ['officially', 8, 18]

And the same for all words

Herman Neple
  • 156
  • 12

3 Answers3

1

Partly taken from: Return positions of a regex match() in Javascript? but adapted to return the length of the match and the match itself:

var wordIndices = (s) => {
  var getAllWords = /\b(\w+)\b/g;
  var output = [];
  while ((match = getAllWords.exec(s)) != null) {
    output.push([match[0], match.index, match.index + match[0].length-1])
  }
  return output
}

s = 'Norway, officially the Kingdom of Norway, is a sovereign state and unitary monarchy whose territory comprises the western portion of the Scandinavian Peninsula';


console.log(wordIndices(s))
user3483203
  • 50,081
  • 9
  • 65
  • 94
1

I think you example results was slightly wrong ['Norway', 0, 6], ['officially', 9, 19], last should have been 8,18..

So the following might be what your after.

var str1 = `Norway, officially the Kingdom of Norway, is a sovereign state and unitary monarchy whose territory comprises the western portion of the Scandinavian Peninsula`;

var regex1 = RegExp(/\b(\w+)\b/g);
var array1;
var ret = [];

while ((array1 = regex1.exec(str1)) !== null) {
  ret.push([array1[0], array1.index, 
    array1.index + array1[0].length - 1]);
}

console.log(ret);
Keith
  • 22,005
  • 2
  • 27
  • 44
  • 1
    Depends how he wants the result, and what he means by lastIndex. But I've just done a quick mod to do it the way we think of it. – Keith Mar 21 '18 at 15:39
0

If your goal is to replace those words, there is an easier solution. You can just use replace with a callback function.

Example:

const input = 'Norway, officially the Kingdom of Norway, is a sovereign state and unitary monarchy whose territory comprises the western portion of the Scandinavian Peninsula'


const output = input.replace(/\b(\w+)\b/g, (word, group, index) => {
    console.log(word, index);

    if (word.length <= 3) {
        return '...';
    } else {
        return word;
    }
})

console.log(output);
Gilles Castel
  • 1,144
  • 11
  • 15