-1

I need to match all the expression (example: Laugh at Loud (LoL)) with 2 or more than 3 words. My regex works only for text with 3 character long expression. How do I make the regex very generic (without specifying the length as 3) so that expression are selected even if they are of any length.

The link shared provides an overview of it.

The last expression

light amplification by stimulated emission of radiation (LASER) Green Skill Development Programme (GSDP) are not selected using the below regex

\b(\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* \(\1\2\3\)

\b(?:\w[\w']* [^a-zA-Z]*){3} ?\([A-Z]{3}\)

https://regex101.com/r/QPMo5M/1

Code Guy
  • 3,059
  • 2
  • 30
  • 74
  • Regex is used for pattern matching, so if you are going to ask help build a regex, you first need to lay down the definition of the pattern you are looking for. Update your question to include the definition of the pattern you call "abbreviation" in simple English. Then someone might help you translate that to regex. – Mat J Aug 01 '20 at 06:30
  • Please understand the question carefully. The above "may be" is not a place here. If you confident then I am grateful to you. – Code Guy Aug 01 '20 at 06:34
  • 1
    [Hi ! I think you can find the answer in here ](https://stackoverflow.com/questions/1508147/generate-an-abbreviation-from-a-string-in-javascript-using-regular-expressions) – Phương Nguyễn Aug 01 '20 at 06:36
  • 1
    please explain problem clearly by providing all valid and invalid matches – anubhava Aug 01 '20 at 06:37
  • Please see the https://regex101.com/r/QPMo5M/1 – Code Guy Aug 01 '20 at 06:37
  • This can't be done using just a (JS) regex – ikegami Aug 01 '20 at 07:01
  • Is it possible using Javascript & regex? – Code Guy Aug 01 '20 at 07:08
  • 1
    Very crude (some1 will be able to come up with something better I'm sure) but something along the lines of [this](https://regex101.com/r/QPMo5M/4) could get you started. Not sure if I would recommend `regex` to be honest. – JvdV Aug 01 '20 at 07:20
  • hmm.... In your existing code, you check each of the N letters of the acronym against the first letter of the N words before the acronym. However, the 5 words before "(LASER)" are "by Stimulated Emission of Radiation", which doesn't match – ikegami Aug 01 '20 at 07:55

2 Answers2

0

You can try the following:

/\b(\w)[-'\w]* (?:[-'\w]* ){1,}\(\1[A-Z]{1,}\)/gi

UPDATE

As @ikegami commented, this sloppy regex matches also things like Bring some drinks (beer) and Bring something to put on the grill (BBQ). I think these cases can be filtered by using proper JavaScript code after doing the regex matching. Maybe in case of Bring some drinks (beer), we can detect it by using the fact that (beer) has no uppercase letters. In case of Bring something to put on the grill (BBQ), we can detect it by using the fact that there's no matching initial letters for the second B and Q in Bring something to put on the grill.


UPDATE 2

When we match the following string by using the regex above:

We need to use technologies from Natural Language Processing (NLP).

It matches "need to use technologies from Natural Language Processing (NLP)", not "Natural Language Processing (NLP)". These problems should be tackled also.


UPDATE 3

The following regex matches acronyms whose length is from 2 to 5 and it doesn't have the issues mentioned above. And I think it can be quite easily extended to support longer length as you want:

/\b(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* )?)?)?) *\(\1\2\3\4\5\)/gi
Gorisanson
  • 2,202
  • 1
  • 9
  • 25
  • 1
    It thinks `Bring some drinks (beer)` is an acronym. The OP's existing solution doesn't. – ikegami Aug 01 '20 at 07:30
  • https://regex101.com/r/QPMo5M/5 Most common mis'~!@-take (MCM) will not be selected. Any idea how to overcome this – Code Guy Aug 01 '20 at 07:36
  • @CodeGuy I updated my answer so that it matches MCM correctly. – Gorisanson Aug 01 '20 at 07:42
  • ...then again, the OP also said `by stimulated emission of radiation (LASER)` should be considered an acronym. *shrug* – ikegami Aug 01 '20 at 07:59
  • @ikegami What do you mean? It matches `light amplification by stimulated emission of radiation (LASER)`. – Gorisanson Aug 01 '20 at 08:03
  • @Gorisanson, 1) Check again. Their 3-letter solutions would only match the three previous words (cause matching the entire preceding sentence wouldn't be too useful), and they're asking for this to be generalized to N-letter solutions. 2) You're making my point. If `light amplification by stimulated emission of radiation (LASER)` should match, then it makes perfect sense for `Bring some drinks (beer)` to match too – ikegami Aug 01 '20 at 08:11
  • @ikegami I still don’t understand what you mean by “`by stimulated emission of radiation (LASER)`”. And I think things like `Bring some drinks (beer)` or `Bring something to put on the grill (BBQ)` can be dealt by doing postprocess using JS. – Gorisanson Aug 01 '20 at 08:12
  • @Gorisanson, How many letters in `LASER`? 5, so only the previous 5 words should be included in the match – ikegami Aug 01 '20 at 08:13
  • @ikegami If you think the OP’s question has a problem. You should say it to the OP, not me. I think `Bring some drinks (beer)` can be dealt with JS since `beer` does not contains upper-case letter. And also I think `Bring something to put on the grill (BBQ)` can be dealt with JS since it does not contain second `B` and `Q` in the full-length name. – Gorisanson Aug 01 '20 at 08:21
  • Re "*If you think the OP’s question has a problem.*", I identified a problem with your answer. You agreed it was a problem. (You said some your answer is incomplete as it needs post-processing). I latter came back to say the problem is not just with your answer, that's all. – ikegami Aug 01 '20 at 08:24
  • Re "*`Bring some drinks (beer)` can be dealt with JS*", Why? Would would you reject that and allow `light amplification by stimulated emission of radiation (LASER)`? Neither matches the OP's definition of an acronym. – ikegami Aug 01 '20 at 08:24
  • @ikegami What is “the OP’s definition of an acronym”? – Gorisanson Aug 01 '20 at 10:05
  • How to address those issues using the javascript? Do we have something that's possible using NLP.js library? – Code Guy Aug 01 '20 at 14:57
  • @CodeGuy Oh, the string `We need to use technologies from Natural Language Processing (NLP).` is just an example to demonstrate the issues. I don't know much about NLP. – Gorisanson Aug 01 '20 at 15:02
  • @CodeGuy Maybe I think it can be hard to have a complete method to solve all the possible issues. Just we can use a variety of ad-hoc to complement the sloppiness of the regex when we find an issue. In case of `need to use technologies from Natural Language Processing (NLP)`, I think we can filter it by using the fact that `need to use technologies from Natural Language Processing` has a rightmost substring which has initial letters `N`, `L`, and `P`. – Gorisanson Aug 01 '20 at 15:14
  • @CodeGuy And I think [@JvdV's regex](https://regex101.com/r/QPMo5M/4) in [their comment](https://stackoverflow.com/questions/63202430/matching-all-abbreviations-using-js-regex/63202714?noredirect=1#comment111762374_63202430) is more strict and better, though somewhat complicated, than the regex on my answer. Since it matches acronyms to the length 5, I think you can extend it to the length as you want. – Gorisanson Aug 01 '20 at 15:41
  • @ikegami I apologize for the expression “You should say it to the OP, not me.”. It is rude so I should had rather said “I think you can say it also to the OP”. – Gorisanson Aug 02 '20 at 04:57
  • np; I didn't think it was rude. I did raise the problem on the Question, and the OP is also notified of comments on answers. – ikegami Aug 02 '20 at 04:59
  • @ikegami And I think now I understand what you mean by “the OP’s definition of an acronym”. Maybe you focused on the example regex in the OP’s question and you have thought we should extend it so it should match just 5 words in front of `(LASER)`. As for me, I just focused on finding a regex which match all the acronyms in the link https://regex101.com/r/QPMo5M/1 which the OP have provided in their question. Thank you for the helps and the guide you have provided! – Gorisanson Aug 02 '20 at 04:59
-1
\b(\w)[-'\w]* (?:[-`."?,~=@!/\\|+:;%°*#£&^€$¢¥§'\w]* ){2,}\(\1[A-Z]{2,}\)

I placed some special characters in between

Code Guy
  • 3,059
  • 2
  • 30
  • 74
  • You can consider to change `{2,}`'s to `{1,}`'s, since you said you also want to match 2 word abbreviations. I updated my answer and applied these changes to the regex in my answer also so that it can match things like `post script (PS)` :) – Gorisanson Aug 01 '20 at 13:38