2

I searched and found that [^?] will not include a certain character, such as a question mark in this case, but it seems to include a space instead which is not what I want. This pattern:

\((.*?)\)[^?]

matches anything in brackets unless there is a question mark right after the last bracket.

(need to capture including brackets) ignore this
(ignore this completely)?

This pattern captures the top line in brackets correctly without including the space, but also captures the line below which I want to ignore:

\((.*?)\)

What pattern can I use to capture the top line only without the trailing space but ignore the line below?

You can see that neither of these patterns work correctly:

https://regex101.com/r/fHXJ8x/1

https://regex101.com/r/fHXJ8x/2

Hasen
  • 11,710
  • 23
  • 77
  • 135
  • ```[^?]``` captures anything other than question mark here, try ```(?<=\?)``` – Ghost Ops Aug 27 '21 at 05:23
  • @Ghost Ops You mean `\((.*?)\)(?<=\?)` ? If so, that doesn't work. – Hasen Aug 27 '21 at 05:24
  • why can't you just strip the result? it'll be easy though... – Ghost Ops Aug 27 '21 at 05:29
  • @Ghost Ops I think you have some kind of idea in mind but it's not clear what it is. – Hasen Aug 27 '21 at 05:31
  • I mean, apply ```strip()``` method to the result to exclude spaces at the end of the string. I got it, did you get what i mean? – Ghost Ops Aug 27 '21 at 05:33
  • @Ghost Ops Ok you mean process it after the regex capture? So that would mean it's not possible to do this in regex? I thought it would be possible to have characters which are not included in a pattern. – Hasen Aug 27 '21 at 05:36
  • Do you need to match `(...)? and ()` in a string like `... (...)? and () ...`? If not see [my answer](https://stackoverflow.com/a/68951147/3832970). – Wiktor Stribiżew Aug 27 '21 at 11:06

4 Answers4

7

Try this regex...

It works, ignoring any text inside bracket, which is also next to a question mark

Also ignores unwanted spaces

\((.*?)\)(?!\?)

Output:

enter image description here

Hasen
  • 11,710
  • 23
  • 77
  • 135
Ghost Ops
  • 1,710
  • 2
  • 13
  • 23
  • 1
    Hey that works! Looks like you found it in the end. – Hasen Aug 27 '21 at 06:31
  • It should actually be this `\((.*?)\)(?!\?)` though otherwise it loses the functionality of my original pattern. I've edited your answer anyway so just approve it and it's fine. – Hasen Aug 27 '21 at 06:39
  • Yeah, i just inserted the whole regex into a class for no reason which absolutely does no change in functionality, but its ok, ur right @Hasen. Anyway, thanks for the edit – Ghost Ops Aug 27 '21 at 06:41
  • 1
    Np, thanks for your answer, that's exactly what I was looking for. – Hasen Aug 27 '21 at 06:43
1

First of all, you cannot use a negated character class ([^?]) because it is a consuming pattern, i.e. the regex engine puts the matched text into the match memory buffer and advances the regex index to the match end position. That is why it matches that whitespace. You need to use a negative lookahead that is a non-consuming pattern, (?!\?), that won't add the text matched into the match.

Second, you should not rely on .*? when you restrict the context of the subsequent pattern because this pattern can match any amount of any text (other than line break chars by default). If you have ... (...)? and () ..., the \(.*?\)(?!\?) will match the leftmost ( until the leftmost ) that is not immediately followed with a ? char, i.e. the match will be (...)? and (), see this regex demo.

The solution is to avoid matching ( and ) in between parentheses:

\(([^()]*)\)(?!\?)

See the regex demo. Details:

  • \( - a ( char
  • ([^()]*) - Group 1: zero or more chars other than ( and )
  • \) - a ) char
  • (?!\?) - a negative lookahead that fails the match if there is a ? char immediately to the right of the current location ("fails" here mean that the regex engine will backtrack to see if it can match a string in another way).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Well, you can use something like this:

(\(.+\)\?)|(\(.*\))

You can't just ignore second string because by your requirements they are the same. Each of it contain brackets.

But you can define two groups in regex and use only second

(need to capture including brackets) ignore this $2

(ignore this completely)? $1

-1

Here is an example program written in C# - the comments describes what was changed during the feedback from the comments, and the regex's are in the order they appeared in this post.

// Porgram has been modifed in accordance with the dabate in the comments section
using System;
using System.Text.RegularExpressions;

namespace CS_Regex
{
    class Program
    {
        // Match parenthesized texts that aren't followed by a question mark.
        static void Main(string[] args)
        {
            string[] tests =
            {
                "(match this text) ignore this (ignore this)? and (match this) (and this)"
            };
            // The first three patterns matches, if the left parenthesis is not the last character.
            // The last pattern matches all parenthesized texts.
            string[] patterns = {
                @"\((.*?)\)[^?]", // Original regex
                @"\((.*)\)[^?]", // Regex that matches greedily, which was my first example, that caused the discussion in the comments.
                                 // I asked "Why do you have a question mark after matching zero or more characters?"
                @"(\([^)]*\))[^?]", // Regex that only matches if the left parenthesis is followed by another character, avoiding the use of the '?' operator.
                @"(\([^)]*\))(?!\?)", // Regex that matches all instances
            };
            foreach (string pattern in patterns) {
                Regex rx = new Regex(pattern, RegexOptions.Compiled);
                Console.WriteLine($"Regex: {pattern}");
                foreach (string data in tests)
                {
                    MatchCollection matches = rx.Matches(data);
                    Console.WriteLine($"{matches.Count} matches found in: {data}");
                    foreach (Match match in matches)
                        Console.WriteLine($"   matched value and group: '{match.Value}' and '{match.Groups[1]}'");
                }
            }
            Console.ReadKey();
        }
    }
}

The program produces the following output:

Regex: \((.*?)\)[^?]
2 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ' and 'match this text'
   matched value and group: '(ignore this)? and (match this) ' and 'ignore this)? and (match this'
Regex: \((.*)\)[^?]
1 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ignore this (ignore this)? and (match this) ' and 'match this text) ignore this (ignore this)? and (match this'
Regex: (\([^)]*\))[^?]
2 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text) ' and '(match this text)'
   matched value and group: '(match this) ' and '(match this)'
Regex: (\([^)]*\))(?!\?)
3 matches found in: (match this text) ignore this (ignore this)? and (match this) (and this)
   matched value and group: '(match this text)' and '(match this text)'
   matched value and group: '(match this)' and '(match this)'
   matched value and group: '(and this)' and '(and this)'

The example has been edited, to reflect the discussion in the comments.

GoWiser
  • 857
  • 6
  • 20
  • "*Why do you have a question mark after matching zero or more characters?*" it is [a lazy quantifier](https://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) to prevent the regex from over-matching. E.g., with pattern `\((.*)\)` and input `(a) (b)` the captured group would be `a) (b`, while if the quantifier is made lazy, then it will only capture `a`. see also [What is the difference between .*? and .* regular expressions?](https://stackoverflow.com/q/3075130) – VLAZ Aug 27 '21 at 05:54
  • I think you are referring to the original post, and not mine. I have used regular expresions on a regular basis since 1988 and am pointing out exactly what you write. – GoWiser Aug 27 '21 at 06:01
  • You asked about the question mark. And didn't seem like a rhetorical question because there is no answer to it in your post. Surely if you're familiar, then you know then that you've *changed* how the pattern matches and now the result is not the same as what OP had. – VLAZ Aug 27 '21 at 06:05
  • I used his original regexp AND wrote the question mark is unnecessary. Lol. – GoWiser Aug 27 '21 at 06:08
  • Yes, [and it behaves differently](https://regex101.com/r/T4t6zA/1/) to [what OP's regex would match lazily](https://regex101.com/r/UlEqcM/1). – VLAZ Aug 27 '21 at 06:09
  • I do not understand what your purpose is. Can you please clarify why you are doing this? I copy pasted the regexp from the original post AND deleted the question mark AND wrote that it is not needed BEFORE you posted your first comment. This "\\((.*?)\\)[^?]" is the same as this "\\((.*)\\)[^?]". STa – GoWiser Aug 27 '21 at 06:12
  • You asked a question, I answered it to explain *why* OP would be using a lazy quantifier because it seemed you weren't aware. I then gave you an example of why a lazy quantifier would make a difference because you seemed to ignore the difference. You keep ignoring it. You've *changed* the semantics of the match. If there are multiple things in brackets, your regex will eagerly match all, instead of each. You've neither described that in your answer, nor acknowledged OP's approach. If you don't *know* whether the new regex is valid, you should wait for clarification from OP, not change it – VLAZ Aug 27 '21 at 06:20
  • On a separate note, OP's regex *doesn't* work for them. That's why they asked here. They *don't* want to match the following space. Yet you've ignored that, too. – VLAZ Aug 27 '21 at 06:21
  • I always change my post when I find I can formulate it better or if some characters are removed or malformed by stackoverflow's formating (as it did). Are you telling me I am not allowed to fix typos? – GoWiser Aug 27 '21 at 06:24
  • I don't see how you take any of what I said as what you claim I said. I'll be clear once again - the pattern you suggest does ***not*** match the same as OP's pattern. You can see the examples I gave you. There is nothing about fixing typos in your post. I'm saying that if you want to make such radical changes, you should either have waited for OP to clarify that it's correct or at the very least pre-emptively explained what the difference would be in your answer to better inform readers. I repeatedly explained there would be a difference and you haven't acted on this information. – VLAZ Aug 27 '21 at 06:31
  • The regexpt is a copy paste from the post of the original pattern. Why do you keep saying it is not? As I see it, you are in violation of the stackoverflow guidelines for comments. – GoWiser Aug 27 '21 at 06:31
  • I'm saying that ***it matches different things as I've repeatedly shown you***. – VLAZ Aug 27 '21 at 06:32
  • If you test the pattern without that first question mark you'll find that the sentence `(need to capture including brackets) ignore this (also capture this)` would be matched very differently. That wasn't part of my example of course but it should be assumed that I need it that way. – Hasen Aug 27 '21 at 06:34
  • The original poster, didn't specify if he matches single or multiple brackets. If he matches multiple, he needs "[^)]*" instead of ".*" – GoWiser Aug 27 '21 at 06:35
  • @Gowiser I'm the OP and you'll notice my example above is not multiple lines. I didn't go into why I need that question mark but it does match different things as VLAZ explained. – Hasen Aug 27 '21 at 06:36
  • You should check out this link https://stackoverflow.com/questions/31201690/find-word-not-followed-by-a-certain-character which makes your post a duplicate post. – GoWiser Aug 27 '21 at 06:50
  • Your answer is a 'question' and should have been a comment asking why I needed that question mark rather than this. – Hasen Aug 27 '21 at 07:01
  • Sorry for trying to help you and making assumptions about what you meant. – GoWiser Aug 27 '21 at 07:11