1

I would like to find a regex and replacement to do all the following tasks at once:

  1. Replace all 's that are ending words to the same word but with s instead (regex: 's\b, replacement: s)
  2. Remove all non-word characters at the start and at the end of the string (regex: ^\W*(.*?)\W*$, replacement: \1)
  3. Replace all consecutive other non-word characters with a single dash (regex: \W+, replacement: -)

As seen I'm able to write all the rules one by one, but all together is much, much harder. I can't reduce the number of regexes to use...

I'd say that if it works on regex101, that's good enough. If it's not possible, I can accept that and saying so is good enough.

Examples:

Input               --->   Output
-----                      ------ 
Hello, World!       --->   hello-world
(%O'Shea's,*cAt!§   --->   o-sheas-cat
Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
  • 1
    You can do all in a single step using Notepad++. – revo Dec 10 '19 at 20:34
  • Which language? Lowercase conversions are not typical in regex engines, so you're not going to be language-agnostic regardless of the method you choose. Also, what makes `O'Shea's` become `o-sheas` and not `o-shea-s`?? – ctwheels Dec 10 '19 at 20:39
  • @ctwheels Thank you, I wasn't aware, so I'll say that if one rule is optional, it's that one. – Olivier Grégoire Dec 10 '19 at 20:44
  • You'll need two replacements: `'(?=s\b)` replace with nothing and then `\W+` replace with `-`. Then convert to lowercase. – ctwheels Dec 10 '19 at 20:48
  • @ctwheels regarding "O'Shea's", the rule number 1. See the regex and the replacement. – Olivier Grégoire Dec 10 '19 at 20:48
  • 1
    P.S. what @revo is referring to can be seen in [this answer](https://stackoverflow.com/a/37161309) – ctwheels Dec 10 '19 at 20:50
  • Task number 3 replacement is with a dash. Tasks 1 and 2 are with nothing. The replacement has to do logic if these are to be combined. The simple logic of the boost special replacement mode used by Notepad++ can be used for a one-off thing. But, if a cross-platform, permanent solution is required, the particular regex engine and host become important. Sorry, but thats the way it is. –  Dec 10 '19 at 20:58
  • By the way, your second rule is satisfied by your third - you can remove the second rule entirely. – ctwheels Dec 10 '19 at 21:05
  • @ctwheels No, [`\w` matches the underscore (`_`), not the dash (`-`)](https://regex101.com/r/Lou5U6/1). – Olivier Grégoire Dec 10 '19 at 21:13
  • @OlivierGrégoire right, and you have `\W` in both rules; therefore, the third rule accomplishes everything the second one does. Think about it this way `\W*` will match whatever `\W+` does – ctwheels Dec 10 '19 at 21:13
  • @ctwheels `???abc???` will be `-abc-` with rule 3. With rule 2 it will become `abc`. Do I get anything wrong here? – Olivier Grégoire Dec 10 '19 at 21:16
  • @OlivierGrégoire gotcha, you mean the replacement. OK so you can combine your first and second rule: `'(?=s\b)|^\W+|\W+$` – ctwheels Dec 10 '19 at 21:17
  • 1 and 2 combined and replace with nothing. So my previous `Task number 3 replacement is with a dash. Tasks 1 and 2 are with nothing.`, et all.. will apply –  Dec 10 '19 at 21:20
  • Okay, that seems good enough. Thank you both for your help. It can't be reduced to one expression with something else than boost, and the lowercasing isn't available everywhere. Now I know. Great! – Olivier Grégoire Dec 10 '19 at 21:29
  • Indeed with NP++ you could use a conditional replacement, eg search for `(\w+)|'(?=\w\b)|^\W+|\W+$|(\W+)` and replace with `(?1\L\1:(?2-:))` – bobble bubble Dec 10 '19 at 21:45

0 Answers0