How can I exclude a character from a regex capturing group?

Question

I have a regex capture, and I would like to exclude a character (a space, in this particular case) from the middle of the captured string. Can this be done in one step, by modifying the regex?

(Quick and dirty) example:

Text: Key name = value
My regex: (.*) = (.*)
Output: \1 = "Key name" and \2 = "value"
Desired output: \1 = "Keyname" and \2 = "value"

Update: I'm not sure what regex engine will run this regex, since it's part of a larger software product. If you have a solution, please specify which engines it will run on, and on which it will not.

Update2: The aforementioned product takes a regex as an input, and then uses the matched values further, which is the reason for which a one-step solution is asked for. There is no opportunity to insert an intermediate processing step in the pipeline.

What is the language? It is difficult to render appropriate help without knowing the programming language the regex will be used in. As the [regex tag info](http://stackoverflow.com/tags/regex/info) states, all questions with this tag should also include a tag specifying the applicable programming language or tool. — Wiktor Stribiżew, Dec 15 '15 at 10:05
@stribizhev, the full solution may depend on the language, but the answer to the question doesn't. You can't do that in a single regex match in any regex flavor. You have to match the whole thing and remove the spaces afterward. — Alan Moore, Dec 15 '15 at 10:42
@stribizhev: On reflection, I realize that was a canned comment that you posted simply because there's no "flavor" tag. It's good general advice, but you should make it clear that it **is** general advice. Because you seem to be implying that it's relevant in *this* case, when it isn't. — Alan Moore, Dec 15 '15 at 11:37
@AlanMoore: Why do you insist we do not need to know the language? One of the answers uses `\G` - it won't work with Python `re` or with JavaScript. Knowing the language/tool is *always* important. — Wiktor Stribiżew, Dec 15 '15 at 11:39
The question linked seems to be indeed a duplicate, however the answer is that it's impossible :( — Dan Nestor, Dec 15 '15 at 11:58
@stribizhev: That answer misses the point of the question. Dan is asking how to skip over characters in the course of a single match, which can't be done in any flavor. — Alan Moore, Dec 15 '15 at 12:09
@DanNestor: There is no way to match non-continuous text in just one match operation. — Wiktor Stribiżew, Dec 15 '15 at 12:13
Hm... not sure if I need a single match though. What I meant by "one step" was that I can't use a subsequent replace operation as some might have suggested (indeed, the answer would have been trivial). I will test to see if a multiple-match solution works in my tool. — Dan Nestor, Dec 15 '15 at 12:15

Giuseppe Ricupero · Answer 1 · 2015-12-15T12:06:11.020

0

This is a possible theoretical pure-regex implementation using the end-of-previous-match \G anchor:

/(?:\G(\w+)\h(?:(?:=\h)(\w+))?)+/g

Online demo

Legenda

(?:           # Non capturing group 1
  \G          # Matches where the regex engine stops in the previous step
  (\w+)       # capture group 1: a regex word of 1+ chars
  \h*         # zero or more horizontal spaces (space, tabs)
  (?:         # Non capturing group 2
    =\h*      # literal '=' follower by zero or more hspaces
    (\w+)     # capture group 2: a regex word of 1+ chars
  )?          # make the non capturing group 2 optional
)+            # repeat the non capturing group 1, one or more

In the substitution section of the demo:

\1 actually contains Keyname (the 2 terms are separated by a fake space)
\2 is value

NOTE: i don't recommend using this unless actually needed (why?).

There are multiple possible approaches in 2 steps: as surely already stated simply strip spaces from the first capturing group of the OP regex.

edited Dec 15 '15 at 12:06

answered Dec 15 '15 at 11:14

Giuseppe Ricupero

6,134
3
23
32

The question states clearly that a one-step solution is needed. Is there any particular reason behind you saying that you don't recommend your solution? – Dan Nestor Dec 15 '15 at 11:45
Your solution doesn't produce the expected result, and I don't understand it enough to modify it myself. Could you edit it (if it's possible) to satisfy the requirements in the question? – Dan Nestor Dec 15 '15 at 11:53
@DanNestor: explain me why a one step solution is required, i found interesting trying to solve your problem but i think that with a complete explanation of the details your question will be better received. In any case ``\G`` is not available in some regex engine. For what i know without ``\G`` is not possible to satisfy all your constraints. – Giuseppe Ricupero Dec 15 '15 at 12:01
1

`\G` works fine by itself, there's no need to wrap it in a lookbehind. – Alan Moore Dec 15 '15 at 12:02
1

@GsusRecovery I'm not sure why the motives have to be explained for a question to be better received, but since you ask, it's because I do not have the opportunity to run a second step. As stated in the question, this regex will be used in a software product, and the product takes a regex as an input, and then uses the matched values further. – Dan Nestor Dec 15 '15 at 12:10
@DanNestor: if it's not clear, explain just for spare your time. You avoid answers of people that tell you: "why don't you do this way instead?" without counting the people that asks you "why do you want to do this way?" – Giuseppe Ricupero Dec 15 '15 at 12:34
1

@GsusRecovery yeah, I hoped that I will get this result by stating that I need a one-step solution. :) – Dan Nestor Dec 15 '15 at 12:36

Jan · Answer 2 · 2015-12-15T10:46:48.397

I would come up with sth. like:

(?<key>[\w]+)\s*=\s*(?<value>.+)
# look for a word character and capture it in a group called "key"
# followed by zero or unlimited times of a whitespace character (\s)
# followed by an equation sign
# followed by zero or unlimited times of a whitespace character (\s)
# capture the rest in a group called value

... and process the captured output afterwards. But with the \w character class no whitespace will matched (do you have keys with a whitespace in it?).
See a working demo here. But as mentionned in the comments, it depends on your programming language.

How can I exclude a character from a regex capturing group?

2 Answers2