3

I have a string which can be :

X=Y 

or

X

Here X and Y can be any word containing alphabets.

I want that when string is X=Y then X should in group 1 and Y in group 2

but when string is X then X should be in group 2.

So far I am able to get only this :

(\w+)=(\w+)

What should be the right regex for it?

petezurich
  • 9,280
  • 9
  • 43
  • 57
Anudocs
  • 686
  • 1
  • 13
  • 54

2 Answers2

2

To match alphabets, you need to use [a-zA-Z] (to match any ASCII letter) or [^\W\d_] (this matches any Unicode letter), not \w that matches letters, digits or underscores and some more chars by default in Python 3.x.

You need

^(?:([A-Za-z]+)=)?([A-Za-z]+)$

Or

^(?:([A-Za-z]+)=)?([A-Za-z]+)\Z

See the regex demo

Details

  • ^ - start of string
  • (?:([A-Za-z]+)=)? - an optional non-capturing group matching 1 or 0 occurrences of:
    • ([A-Za-z]+) - Group 1: one or more letters
    • = - a = char
  • ([A-Za-z]+) - Group 2: one or more letters
  • \Z - the very end of string ($ matches the end of string position).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • The most complete explanation. However, I would say that `$` matches the very end of string *and* any newline that appears immediately before it: `re.findall('$', 'abc\n')` -> `['', '']` – Booboo Feb 27 '20 at 12:21
  • @Booboo You are using `re.findall` that searches for multiple match occurrences, that is why you have two matches :). Using `re.search` will yield the first match only. – Wiktor Stribiżew Feb 27 '20 at 12:22
  • I am using `findall` just to demonstrate what `$` matches. From the manual concerning `$`: **Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.** – Booboo Feb 27 '20 at 12:25
  • @Booboo That is why I posted two options and explained them in my answer. Also, mentioned in [Checking whole string with a regex](https://stackoverflow.com/questions/3994493/checking-whole-string-with-a-regex/44980529#44980529) – Wiktor Stribiżew Feb 27 '20 at 12:26
  • Nowhere in your distinction was the word *newline* mentioned. That's all I am saying. – Booboo Feb 27 '20 at 12:30
  • @Booboo It is a known fact. That is the only distinction between "the end of string" and "the very end of string". – Wiktor Stribiżew Feb 27 '20 at 12:30
0

You almost had it with your original regex.

It just needs a couple of tweaks:

^(\w+)(=(\w+))?$
  • ^ = start of string
  • (\w+) = 1st capture group matching any word like character (including numbers) as many times as possible.
  • (=...)? = everything inside this 2nd capture group (starting with "=") is optional
  • 2nd (\w+) = 3rd capture group matching the same stuff as the first one
  • $ = end of string

update

My answer does not actually answer the original question because the "X" string does not land "X" in the second answer group.

I considered deleting the answer, but I'm going to keep it up for the sake of other visitors on the site who are looking for a simpler answer that does not require "X" to specifically be in the second capture group.

Also, maybe the original asker would rather have a simpler regex and modify their code to work with the regex instead of making a regex to work with the code.

Daniel Tonon
  • 9,261
  • 5
  • 61
  • 64
  • The answer is wrong because the `X` in `X` string [will land in Group 1](https://regex101.com/r/1cToxY/1), while OP wants it in Group 2. – Wiktor Stribiżew Feb 27 '20 at 12:23
  • I've updated the answer with an explanation why I think the answer is still useful even though it doesn't fullfill 100% of the original askers requirements. – Daniel Tonon Feb 27 '20 at 12:37
  • Other users will probably find another answer already useful enough at [How do I make part of a regex match optional?](https://stackoverflow.com/a/12451749/3832970). If it was just about that, I would have closed the question as its duplicate. – Wiktor Stribiżew Feb 27 '20 at 12:40