1

Is there a way to use a single regular-expression to match only within another math. For example, if I want to remove spaces from a string, but only within parentheses:

source : "foobar baz blah (some sample text in here) and some more"

desired: "foobar baz blah (somesampletextinhere) and some more"

In other words, is it possible to restrict matching to a specific part of the string?

Synetech
  • 9,643
  • 9
  • 64
  • 96
  • Basically, you're intending to replace within a capture group. Hope this helps: https://stackoverflow.com/questions/34973192/how-to-replace-within-a-capture-group – Sajad Jul 02 '22 at 19:46
  • In what language/tool? Different features are supported among regex flavors. – bobble bubble Jul 02 '22 at 20:45

2 Answers2

1

One idea is to replace any space between parentheses using a lookahead pattern:

 (?=([^\s\(]+ )*\S*\))(?!\S*\s*\()` 

The lookahead will attempt to match the last space before the closed parenthesis (\S*\)) and any optional space before ([^\s\(]+ )* (if found).

Detailed Regex Explanation:

  • : space
  • (?=([^\s\(]+ )*\S*\)): lookahead non-capturing group
    • ([^\s\(]+ )*: any combination characters not including the open parenthesis and the space characters + space (this group is optional)
    • \S*\): any non-space character + closed parenthesis
  • (?!\S*\s*\(): what lookahead should not be
    • \S*: any non space character (optional), followed by
    • \s*: any space character (optional), followed by
    • \(: the open parenthesis

Check the demo here.

lemon
  • 14,875
  • 6
  • 18
  • 38
  • 1
    This works without PCRE, so it's what I'll use when using Rename Regex to shorten filenames by stripping out spaces within specific parts. Thanks. – Synetech Jul 03 '22 at 20:16
  • This only works if the text between the parentheses contains spaces; if not, it targets the spaces _outside_ the parentheses (so you can't run it more than once). – Synetech Jul 10 '22 at 22:28
  • Correct. Check the updated answer. @Synetech – lemon Jul 10 '22 at 22:53
  • Hmm, that works with the sample text you used, but if you put any kind of non-alphanumeric character in the text before the parentheses (eg `foobar baz! (some…`), then it will eat the spaces in that text, but the text after the parenthetical is immune. I tried to debug it, but couldn't sort it out because you only used `\s` and `\S`, not any character-classes that would/should specify only alphanumeric characters. I'm almost inclined to think it's a bug. – Synetech Jul 11 '22 at 23:33
1

In PCRE a combination of \G and \K can be used:

(?:\G(?!^)|\()[^)\s]*\K\s+
  • \G continues where the previous match ended
  • \K resets beginning of the reported match
  • [^)\s] matches any character not in the set

See demo at regex101

The idea is to chain matches to an opening parentheses. The chain-links are either [^)\s]* or \s+. To only get spaces \K is used to reset before. This solution does not require a closing ).


In other regex flavors that support \G but not \K, capturing groups can help out. Eg Search for

(\G(?!^)|\()([^)\s]*)\s+

and replace with captures of the 2 groups (depending on lang: $1$2 or \1\2) - Regex101 demo


Further there is (*SKIP)(*F), a PCRE feature for skipping over certain parts. It is often used together with The Trick. The idea is simple: skip this(*SKIP)(*F)|match that - Regex101 demo. Also this can be worked around with capture groups. Eg replace ([^)(]*\(|\)[^)(]*)|\s with$1

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 2
    It turns out that Rename Regex (a tool to rename files with regular-expressions) doesn't support PCRE. But it's open-source, so maybe I'll add that. But thanks for the suggestion, it certainly works when PCRE is available. – Synetech Jul 03 '22 at 20:15
  • @Synetech Welcome, yea a lookahead is certainly more compatible. I also tried to address the question title *restrict matching to part of string*. For this `\G` can be certainly useful but also the `(*SKIP)(*F)` technique mentioned if PCRE is used. The [last solution mentioned](https://regex101.com/r/7IIMhx/1) works in pretty every regex flavor that supports capturing groups. No worries, just loving such questions :) – bobble bubble Jul 03 '22 at 21:44