0

I have a regex capture, and I would like to exclude a character (a space, in this particular case) from the middle of the captured string. Can this be done in one step, by modifying the regex?

(Quick and dirty) example:

Text: Key name = value
My regex: (.*) = (.*)
Output: \1 = "Key name" and \2 = "value"
Desired output: \1 = "Keyname" and \2 = "value"

Update: I'm not sure what regex engine will run this regex, since it's part of a larger software product. If you have a solution, please specify which engines it will run on, and on which it will not.

Update2: The aforementioned product takes a regex as an input, and then uses the matched values further, which is the reason for which a one-step solution is asked for. There is no opportunity to insert an intermediate processing step in the pipeline.

Dan Nestor
  • 2,441
  • 1
  • 24
  • 45
  • Can't you replace that after the match? – Giorgi Nakeuri Dec 15 '15 at 10:05
  • 1
    What is the language? It is difficult to render appropriate help without knowing the programming language the regex will be used in. As the [regex tag info](http://stackoverflow.com/tags/regex/info) states, all questions with this tag should also include a tag specifying the applicable programming language or tool. – Wiktor Stribiżew Dec 15 '15 at 10:05
  • @stribizhev, the full solution may depend on the language, but the answer to the question doesn't. You can't do that in a single regex match in any regex flavor. You have to match the whole thing and remove the spaces afterward. – Alan Moore Dec 15 '15 at 10:42
  • @AlanMoore: Why do you address me? I know that. – Wiktor Stribiżew Dec 15 '15 at 10:43
  • @stribizhev: On reflection, I realize that was a canned comment that you posted simply because there's no "flavor" tag. It's good general advice, but you should make it clear that it **is** general advice. Because you seem to be implying that it's relevant in *this* case, when it isn't. – Alan Moore Dec 15 '15 at 11:37
  • @AlanMoore: Why do you insist we do not need to know the language? One of the answers uses `\G` - it won't work with Python `re` or with JavaScript. Knowing the language/tool is *always* important. – Wiktor Stribiżew Dec 15 '15 at 11:39
  • The question linked seems to be indeed a duplicate, however the answer is that it's impossible :( – Dan Nestor Dec 15 '15 at 11:58
  • @stribizhev: That answer misses the point of the question. Dan is asking how to skip over characters in the course of a single match, which can't be done in any flavor. – Alan Moore Dec 15 '15 at 12:09
  • @DanNestor: There is no way to match non-continuous text in just one match operation. – Wiktor Stribiżew Dec 15 '15 at 12:13
  • Hm... not sure if I need a single match though. What I meant by "one step" was that I can't use a subsequent replace operation as some might have suggested (indeed, the answer would have been trivial). I will test to see if a multiple-match solution works in my tool. – Dan Nestor Dec 15 '15 at 12:15

2 Answers2

0

This is a possible theoretical pure-regex implementation using the end-of-previous-match \G anchor:

/(?:\G(\w+)\h(?:(?:=\h)(\w+))?)+/g

Online demo

Legenda

(?:           # Non capturing group 1
  \G          # Matches where the regex engine stops in the previous step
  (\w+)       # capture group 1: a regex word of 1+ chars
  \h*         # zero or more horizontal spaces (space, tabs)
  (?:         # Non capturing group 2
    =\h*      # literal '=' follower by zero or more hspaces
    (\w+)     # capture group 2: a regex word of 1+ chars
  )?          # make the non capturing group 2 optional
)+            # repeat the non capturing group 1, one or more

In the substitution section of the demo:

  • \1 actually contains Keyname (the 2 terms are separated by a fake space)
  • \2 is value

NOTE: i don't recommend using this unless actually needed (why?).

There are multiple possible approaches in 2 steps: as surely already stated simply strip spaces from the first capturing group of the OP regex.

Giuseppe Ricupero
  • 6,134
  • 3
  • 23
  • 32
  • The question states clearly that a one-step solution is needed. Is there any particular reason behind you saying that you don't recommend your solution? – Dan Nestor Dec 15 '15 at 11:45
  • Your solution doesn't produce the expected result, and I don't understand it enough to modify it myself. Could you edit it (if it's possible) to satisfy the requirements in the question? – Dan Nestor Dec 15 '15 at 11:53
  • @DanNestor: explain me why a one step solution is required, i found interesting trying to solve your problem but i think that with a complete explanation of the details your question will be better received. In any case ``\G`` is not available in some regex engine. For what i know without ``\G`` is not possible to satisfy all your constraints. – Giuseppe Ricupero Dec 15 '15 at 12:01
  • 1
    `\G` works fine by itself, there's no need to wrap it in a lookbehind. – Alan Moore Dec 15 '15 at 12:02
  • 1
    @GsusRecovery I'm not sure why the motives have to be explained for a question to be better received, but since you ask, it's because I do not have the opportunity to run a second step. As stated in the question, this regex will be used in a software product, and the product takes a regex as an input, and then uses the matched values further. – Dan Nestor Dec 15 '15 at 12:10
  • @DanNestor: if it's not clear, explain just for spare your time. You avoid answers of people that tell you: "why don't you do this way instead?" without counting the people that asks you "why do you want to do this way?" – Giuseppe Ricupero Dec 15 '15 at 12:34
  • 1
    @GsusRecovery yeah, I hoped that I will get this result by stating that I need a one-step solution. :) – Dan Nestor Dec 15 '15 at 12:36
-1

I would come up with sth. like:

(?<key>[\w]+)\s*=\s*(?<value>.+)
# look for a word character and capture it in a group called "key"
# followed by zero or unlimited times of a whitespace character (\s)
# followed by an equation sign
# followed by zero or unlimited times of a whitespace character (\s)
# capture the rest in a group called value

... and process the captured output afterwards. But with the \w character class no whitespace will matched (do you have keys with a whitespace in it?).
See a working demo here. But as mentionned in the comments, it depends on your programming language.

Jan
  • 42,290
  • 8
  • 54
  • 79