1

I want to detect all the words (but not only in English!) with an underscore using regex, and to then place a hashtag in front of them, so, for example the sentence like:

mind_viral immunity is an important element of general_wellbeing

would produce two matches: mind_viral and 'general_wellbeing` and then change the string to

#mind_viral immunity is an important element of #general_wellbeing 

I'm trying to use this regex:

([a-zA-Z]+(?:_[a-zA-Z]+)*)

But it matches all the words, not only those with an underscore.

What could I do differently?

Aerodynamika
  • 7,883
  • 16
  • 78
  • 137
  • 3
    `[a-zA-Z]+(?:_[a-zA-Z]+)+` - `+` matches **one or more** occurrences – Wiktor Stribiżew Jun 05 '20 at 13:47
  • 1
    @Aerodynamika Also what about leading or trailing or adjacent underscores? I might go with `([a-zA-Z]*(?:_[a-zA-Z]*)+)` This match `mind_viral`. `mind__viral`, `_mindViral`, `mindViral_` etc.. – xtratic Jun 05 '20 at 13:50
  • @WiktorStribiżew please could you reopen the question because I also need it to work not only in English and the answer you provided does not cover this possibility. – Aerodynamika Jun 05 '20 at 13:52
  • What do you mean? It covers your question. If you need to match any letter, replace `[A-Za-z]` with `\p{L}`, `[[:alpha:]]` (see [this](https://stackoverflow.com/questions/6314614/match-any-unicode-letter)) if your library supports them. – Wiktor Stribiżew Jun 05 '20 at 13:55
  • @WiktorStribiżew no it doesn't. it doesn't work for Russian language. also I want to be able to not only match them but also replace them. Please, reopen the question, so somebody else can provide their answer too. Thanks. – Aerodynamika Jun 05 '20 at 14:00
  • [`\p{L}+(?:_\p{L}+)+`](https://regex101.com/r/uFDnDC/1) works well for any lanuage, not only Russian. No need to reopen so that someone could just repeat "use `+` and `\p{L}`". It has been answered. – Wiktor Stribiżew Jun 05 '20 at 14:03
  • Ok @WiktorStribiżew that's fine. You obviously didn't read my question as I wanted not only to match but also to replace the match with the match plus the hashtag. Thank you for providing some help on this, but I will simply post another question some time later, so instead of making StackOverflow cleaner you will actually make it bloated with the same questions because you don't want to nudge :) – Aerodynamika Jun 05 '20 at 14:06
  • To replace text matched with regex, you use a *programming method/function*. Regex itself is just an expression that matches some string of text. You have not indicated neither the language, nor the method you are using, nor what result you expect. Please edit the question rather than asking a new one. – Wiktor Stribiżew Jun 05 '20 at 14:13
  • Ok done that replaced everything – Aerodynamika Jun 05 '20 at 14:20
  • You only added the details you are planning to do it in JS. It is unclear what you need exactly. Please add the expected output for the text you shared. – Wiktor Stribiżew Jun 05 '20 at 14:48
  • Do you mean you need to *prepend the matches with a hash, `#`*? `s = s.replace(/\p{L}+(?:_\p{L}+)+/gu, '#$&')`? – Wiktor Stribiżew Jun 05 '20 at 15:01
  • @WiktorStribiżew that is the one, thank you. maybe you want to post it as an answer so I can accept it? – Aerodynamika Jun 05 '20 at 15:54

1 Answers1

1

With the ECMAScript 2018+ compliant JS environments, you may use

s = s.replace(/\p{L}+(?:_\p{L}+)+/gu, '#$&')

Mind the u flag that enables Unicode property classes in JS regexps.

Here,

  • \p{L}+ - matches any one or more Unicode letters
  • (?:_\p{L}+)+ - one or more repetitions of
    • _ - an underscore
    • \p{L}+ - any one or more Unicode letters

Replace with #$&: # is a literal char, and $& is a backreference to the whole match value.

See the JS demo:

const s = "mind_viral immunity is an important element of general_wellbeing виктор_стрибижев привет";
console.log(s.replace(/\p{L}+(?:_\p{L}+)+/gu, '#$&'));
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563