1

I am using regex to process some files, for example if I have the following lines, and I need to capture the Example number, and whether there is an ERROR or not.

Example 1: bla bla bla
Example 2: bla bla ERROR
Example 3: bla bla

I'm doing 'Example\s+(\d+):.*(?:ERROR)?', it gives me the Example number, but how can I know if ERROR exists?


Update:

I change non-capture group to capture group, but it still doesn't work.

In [77]: line = 'Example 5: abv ERROR zyx'

In [78]: re.search('Example\s+(\d+).+(ERROR)?', line).group(2)

In [79]: re.search('Example\s+(\d+).+(ERROR)', line).group(2)
Out[79]: 'ERROR'

I am totally confused, the word is there but why the optional capture group is not capturing it?

LWZ
  • 11,670
  • 22
  • 61
  • 79

2 Answers2

0

If ERROR is always at the end of the line, you can do the following:

  • Convert the non-capturing group (?:ERROR) to a capturing group.
  • Replace your greedy match for .* with a lazy match .*?.
  • Add the end of line assertion $ at the end.

So, your regex would look something like this:

Example\s+(\d+):.*?(ERROR)?$

Try it online.

Then, you can check whether the second group is empty or not.


If Error doesn't have to be at the end of the line, you can adjust the above regex to look something like this:

Example\s+(\d+):(?:.*?(ERROR)|.*)

This part (?:.*?(ERROR)|.*) of the regex works like this:

(?:       # This is the start of a non-capturing group.
.*?       # Lazy match for zero or more characters (same as the above solution).
(ERROR)   # Matches the characters `ERROR` literally, placed in a capturing group to be able to check if empty (same as the above solution).
|         # Alternative. Meaning match either what's before the `|` or what's after it _inside the non-capturing group_.
.*        # Greedy match for zero or more characters (same as you first original regex).

So, this basically looks for any number of characters (lazy) followed by ERROR Or any number of characters (greedy) not followed by ERROR.

Here's a demo.

Hope that helps.

Community
  • 1
  • 1
  • I guess my question is how do I check whether the second group is empty or not? – LWZ May 15 '18 at 05:04
  • Well, I don't really know python. I was just answering the regex question :D. Anyhow, with a quick search, I found [this answer](https://stackoverflow.com/a/1327389/4934172) which explains how to access the group. And [this question](https://stackoverflow.com/q/9573244/4934172) about how to check if the string is empty. – 41686d6564 stands w. Palestine May 15 '18 at 05:08
  • Oh! I got it! A capturing group can be empty but still exists. I made a non-capturing group so the 2nd group doesn't exist at all. Thanks! – LWZ May 15 '18 at 05:12
  • @LWZ yes, the capturing group can be empty. You only use a non-capturing group if you don't care about what is captured by it, but you need to use a group to complete the regex (for example, in order to use "or" `|` like what I did in my second regex). Please let me know if there's anything else not clear. – 41686d6564 stands w. Palestine May 15 '18 at 05:14
  • It's still not working after I use the capturing group, please see the updated question. – LWZ May 15 '18 at 16:51
  • @LWZ Based on your edit, it looks like you didn't use any of the two regex patterns I suggested! Since you said `ERROR` doesn't have to be at the end, please use the second regex in my answer and let me know if it doesn't work. – 41686d6564 stands w. Palestine May 15 '18 at 16:56
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171108/discussion-between-lwz-and-ahmed-abdelhameed). – LWZ May 15 '18 at 17:16
-1

What you want to do? Your current solutions should work with a bit modification:

re.findall(r'^Example\s+(\d+):|(ERROR)', line)

If returning array length is 2 then it means ERROR is found.

Example if ERROR exists:

>>> line = 'Example 5: abv ERROR zyx'
>>> re.findall(r'^Example\s+(\d+):|(ERROR)', line)
[('5', ''), ('', 'ERROR')]

Example if ERROR does not exist:

>>> line = 'Example 5: abv zyx'
>>> re.findall(r'^Example\s+(\d+):|(ERROR)', line)
[('5', '')]
Hegel F.
  • 125
  • 6
  • I'm not sure I understand the answer. It seems like you just copy and pasted the posters question. Perhaps an example in python showing the behavior you are talking about would be helpful. – Clarus May 15 '18 at 17:10