0

I have some text, including numbers of articles; I need to get an array of these numbers (including the article), after which there are "marker words". f.e. in text:

"123456/9902/001 one two three hand 123456/9902/002 fat got lot 123456/9902/003 five six 123456/9902/004 seven ten butter"

My resulting array for "marker words"= [hand,ten] will be:

["123456/9902/001 one two three hand","123456/9902/004 seven ten butter"]

My code finds something but it works wrong, what would the right regular expression be?

let markers = ["hand", "ten"],
  fin = [];
let num = "(\\d{6}\/\\d{4}\/\\d{3}).*?";
markers.forEach(item => {
  let reg = new RegExp(num + item, 'gmi');
  found = text.match(reg);
  found.forEach(item => fin.push(item));
  if (result) {
    console.log(`for ${item} : ${found.length}`);
    console.log(found);
  } else {
    (console.log('Nothing'))
  }
})
console.log(fin)
David Thomas
  • 249,100
  • 51
  • 377
  • 410
piperpiper
  • 11
  • 1

3 Answers3

0

You can first analyze the text using the following code:

function findArticles(text) {
  return text.match(/(?:\d{6}\/\d{4}\/\d{3})(?: [a-zA-Z]+)+/g).map(item => item[0])
} 

Then get article by marker:

function getArticleByMarker(articles, marker) {
    let result = null
    articles.forEach(article => article.indexOf(marker) > 0 ? result = article : undefined)
    return result
}
Cider
  • 325
  • 1
  • 13
0

Instead of using regex to extract the required articles you can use it to split the string into different article names and then filter out those that don't contain the marker words. Here is an example:

const markers = ['hand', 'ten']
const str = `123456/9902/001 one two three hand 123456/9902/002 fat got lot 123456/9902/003 five six 123456/9902/004 seven ten butter`;

const articleNames = str.split(/(?=\d{6}\/\d{4}\/\d{3})/);

const articleNamesWithMarkers = articleNames.filter(articleName => markers.some(marker => articleName.includes(marker)));

console.log(articleNamesWithMarkers);
Titus
  • 22,031
  • 1
  • 23
  • 33
0

You could split the articles into an array using a look-ahead regex, and then filter that array by a marker-based regular expression:

let text = "123456/9902/001 one two three hand 123456/9902/002 fat got lot 123456/9902/003 five six 123456/9902/004 seven ten butter";

let markers = ["hand","ten"];
let regex = new RegExp("\\b("+markers.join("|")+")\\b", "");
let result = text.split(/(?=\s*\d{6}\/\d{4}\/\d{3})/).filter(art => regex.test(art));

console.log(result);

If your markers would contain characters that have a special meaning in regex, you would need to escape them.

trincot
  • 317,000
  • 35
  • 244
  • 286