0

I have strings like bright orange bags contain 5 faded olive bags, 5 posh tomato bags, 1 plain green bag.

I want to extract the colour of the containing bag, and the number and colours of the contained bags. So my ideal output would contain ['bright orange', '5', 'faded olive', '5', 'posh tomato', '1', 'plain green'].

I've tried the following regex, but is isn't giving me what I want:

/^(\w+ \w+) bags contain (?:(\d+) (\w+ \w+) bag(?:s.|.|s, |, ))+$/

That gets me

["bright orange bags contain 5 faded olive bags, 5 posh tomato bags, 1 plain green bag.", "bright orange", "1", "plain green"]

Which is the container colour and the last contained quantity and colour.

If I change the + to a specific number, e.g. {2}, then I get the correct output for strings with exactly that number of matches, but I don't want to do n regex where n is the maximum number of matches, and {1,n} gives the same result as +.

I've looked at this question but its answer specifies a number.

Is there a regex to output every time the group matches?

(I've specified JavaScript because I know is does regex differently in some circumstances)

Matt Ellen
  • 11,268
  • 4
  • 68
  • 90
  • Something like `[...s.matchAll(/(\d+) (\w+) (\w+) bags?/g)]`? Grab `bags contain` separately? If you want multiple matches or more precision, I'd use multiple passes: the first to grab the "digit ... bag(s)" chunks, or even the substring you need, then cut up each chunk with a second regex. Capture groups will always hold the last match. A lot of this depends on your actual use case which seems pretty hard to determine based on the one sentence here--so many variants seem possible. – ggorlen Dec 07 '20 at 23:43
  • Did one of these answers solve your problem? If not, could you provide more information to help answer it? – Nick Jan 17 '21 at 22:03
  • @Nick no. The duplicate points out that it can't be done. – Matt Ellen Jan 18 '21 at 09:32

2 Answers2

0

You could use a regex to match an optional number before a bag description and loop over the matches to form your output array:

const str = 'bright orange bags contain 5 faded olive bags, 5 posh tomato bags, 1 plain green bag.';

const regex = /(?:(\d+)\s+)?(\w+\s+\w+)\s+bags?/g;

let result = [];
while ((arr = regex.exec(str)) !== null) {
  if (arr[1] !== undefined) result.push(arr[1]);
  result.push(arr[2]);
}
console.log(result);
Nick
  • 138,499
  • 22
  • 57
  • 95
0

I'm not sure if this will work under all of your scenarios, but here's what I came up with:

function itemize(string){
  const s = string.split(/\s*(?:,|contain)\s*/);
  for(let i=0,m,v,l=s.length; i<l; i++){
    v = s[i]; m = v.match(/^[0-9]+/); s[i] = v.replace(/^\s*[0-9]+\s*|\s+bag(\.|s)\s*$/g, '');
    if(m){
      s.splice(i++, 0, m[0]); l++;
    }
  }
  return s;
}
let testString = 'bright orange bags contain 5 faded olive bags, 5 posh tomato bags, 1 plain green bag.'
const res = itemize(testString);
console.log(res);
StackSlave
  • 10,613
  • 2
  • 18
  • 35