Supposed to have a comma separated string of text, where each text has or not - comma separated - a token in a list like
var tokens=['Inc.','Ltd','LLC'];
so the string is like
var companies="Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC";
I want to obtain this array as output
var companiesList = [
"Apple Inc.",
"Microsoft Inc.",
"Buzzfeed",
"Treasure LLC"
];
So I firstly did a RegExp
like that
var regex=new RegExp("([a-zA-Z&/? ]*),\\s+("+token+")", "gi" )
that I get the matches and search for a regex like
var regex=new RegExp("([a-zA-Z&/? ]*),\\s+("+item+")", "i" )
for each of the tokens:
tokens.forEach((item) => {
var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "gi")
var matches = companies.match(regex) || []
console.log(item, regex.toString(), matches)
matches.forEach((m) => {
var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "i")
var match = m.match(regex)
if (match && match.length > 2) {
var n = match[1].trim();
var c = match[2].trim();
companiesList.push(n + ' ' + c);
}
});
});
In this way I can capture the tokens and concat matching groups 1 and 2.
var tokens = ['inc.', 'ltd', 'llc'],
companies = "Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure, LLC",
companiesList = [];
tokens.forEach((item) => {
var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "gi")
var matches = companies.match(regex) || []
console.log( item, regex.toString(), matches )
matches.forEach((m) => {
var regex = new RegExp("([a-zA-Z&/? ]*),\\s+(" + item + ")", "i")
var match = m.match(regex)
if (match && match.length > 2) {
var n = match[1].trim();
var c = match[2].trim();
companiesList.push(n + ' ' + c);
}
});
});
console.log(companiesList)
The problem is that I'm missing the comma separated text without a token after the comma like: Buzzfeed
.
The idea is to use a non capturing group in a negative look ahead ( see here about non capturing groups in regex match)
/([a-zA-Z]*)^(?:(?!ltd).)+$/gi
But in this way I have any match when in the input string the token is present:
"Apple, Inc., Microsoft, Inc., Buzzfeed, Treasure LLC".match( /([a-zA-Z]*)^(?:(?!llc).)+$/gi )
while I want to match only the text that do not have it so I would like to get - like the opposite before:
["Buzzfeed"]
So how to negate/modify the previous code to work in both cases to obtain at the end the composed array:
var companiesList = [
"Apple Inc.",
"Microsoft Inc.",
"Buzzfeed",
"Treasure LLC"
];