0

I have a big text in which there are some acronyms. All the acronyms are in parenthesis and are in capital letters. Previous to the parenthesis, there is always the same number of words as the letters in the parenthesis starting with the same alphabets. However, the words might not be started by capital letters.

Ex:

bla bla radar cross section (RCS) bla bla...

bla bla Radar Cross Section (RCS) bla bla...

I need to list all the acronyms. How should I start?

Hamid
  • 253
  • 2
  • 11
  • 3
    This is probably going to require multiple steps. I don't think regex can backwards match arbitrary text based on the length of a match. You can use look behinds but they won't actually match the text you're looking behind for. I'd probably start with `\([A-Z]+\)` to match uppercase letters between parenthesis and then backtrack from where I found them using something like the answer in [this question](https://stackoverflow.com/questions/2295657/return-positions-of-a-regex-match-in-javascript). – Corey Ogburn Jan 30 '19 at 19:29

3 Answers3

2

Here's one possibility. It returns an object whose keys are the acronyms and values are the matching preceding words (without any attempt to normalize them for capitalization.)

const findAcronyms = (str) => {
  const words = str.split(/\s+/)
  
  return words.reduce((all, word, i) => {
    const isCandidate = word.match(/\([A-Z]+\)/)
    if (!isCandidate) {return all}
    const letters = word.split('').slice(1, -1)
    const acro = letters.join('')   
    if (i - letters.length < 0) {return all}
    if (words.slice(i - letters.length, i)
        .map(s => s[0]).join('')
        .toLowerCase() !== acro.toLowerCase()) {
      return all
    }
    
    return {
      ...all, 
      [acro]: words.slice(i - letters.length, i).join(' ')
    }
  }, {})
}

const str = 'bla bla radar cross section (RCS) but this one (IN) is not And This One (ATO) is'

console.log(findAcronyms(str)) //~>
// {
//   RCS: "radar cross section",
//   ATO: "And This One"
// }

Note that "IN" is not included in the result, as it doesn't match the preceding text.

If you just want the actual acronyms, without what they stand for, then you could modify the return to be an array, or you could simply run Object.keys over this result.

Scott Sauyet
  • 49,207
  • 4
  • 49
  • 103
1

const str = "bla bla radar cross section (RCS) bla bla...(aaaaaa) stack overflow (SO)",
  acronymes = [],
  result = str.match(/\(([A-Z].*?)\)/g).map(val => {
    acronymes.push(val.substr(1, val.length - 2));
  });

console.log(acronymes)
Aziz.G
  • 3,599
  • 2
  • 17
  • 35
-1

This is what you could do:

[\([A-Z]+[\)]