22

Is there a way to debug a regular expression in Python? And I'm not referring to the process of trying and trying till they work :)

Here is how regexes can be debugged in Perl:


use re 'debug';

my $str = "GET http://some-site.com HTTP/1.1";
if($str =~/get\s+(\S+)/i) {
    print "MATCH:$1\n";
}

The code above produces the following output on my computer when ran:


Compiling REx "get\s+(\S+)"
Final program:
   1: EXACTF  (3)
   3: PLUS (5)
   4:   SPACE (0)
   5: OPEN1 (7)
   7:   PLUS (9)
   8:     NSPACE (0)
   9: CLOSE1 (11)
  11: END (0)
stclass EXACTF  minlen 5
Matching REx "get\s+(\S+)" against "GET http://some-site.com HTTP/1.1"
Matching stclass EXACTF  against "GET http://some-site.com HTTP/1.1" (33 chars)
   0           |  1:EXACTF (3)
   3        |  3:PLUS(5)
                                  SPACE can match 1 times out of 2147483647...
   4       |  5:  OPEN1(7)
   4       |  7:  PLUS(9)
                                    NSPACE can match 20 times out of 2147483647...
  24       |  9:    CLOSE1(11)
  24       | 11:    END(0)
Match successful!
MATCH:http://some-site.com
Freeing REx: "get\s+(\S+)"

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Geo
  • 93,257
  • 117
  • 344
  • 520

9 Answers9

22

>>> p = re.compile('.*', re.DEBUG)
max_repeat 0 65535
  any None
>>>                         

regex '|' operator vs separate runs for each sub-expression

Community
  • 1
  • 1
Mykola Kharechko
  • 3,104
  • 5
  • 31
  • 40
  • 15
    That's only half of the answer, it shows what the regexp compiles to, but doesn't show how it's executed against a given string. If anyone knows the other half, please share! – Nickolay Dec 30 '13 at 02:58
  • 1
    It looks like `re.search('look for: ".*"', 'look for: "this"', re.DEBUG)` gives a bunch more info. – user2561747 Oct 27 '20 at 21:21
  • @user2561747 That is also a regex compilation debug, not a regex runtime debug. It doesn't even work with an already compiled regex. – Bulletmagnet Jan 25 '23 at 18:07
9

https://www.debuggex.com is also pretty good. It's an online Python (and a couple more languages) debugger, which has a pretty neat visualization of what does and what doesn't match. A pretty good resource if you need to draft a regexp quickly.

Nikita R.
  • 7,245
  • 3
  • 51
  • 62
1

Why don't you use some regEx tool (i usually use Regulator) and test the regex-expression there and when you are satisfied, just copy/paste it into your code.

sabiland
  • 2,526
  • 1
  • 25
  • 24
  • 1
    Because using a regex tool won't tell me why my regex isn't working. – Geo Mar 03 '09 at 13:18
  • @Geo - what exactly do you mean by "isn't working". Isn't working at all, isn't matching the things you want to match or ... ? – Rook Mar 03 '09 at 13:24
  • 2
    At the risk of stating the obvious, a regex tool can't tell you why it isn't giving you the right matches. A regex is going to do exactly what you tell it, and the best any tool can do is step you through so that you can figure out yourself which bit is wrong. – Noldorin Mar 03 '09 at 13:27
  • @Noldorin - in which case I'd reccommend a book, "Learning ..." by O'Reilly, wonderful for this kinda stuff. – Rook Mar 03 '09 at 13:30
  • @Idigas: Not quite sure what you mean. There's a "Mastering Regular Expressions" book by O'Reilly... are you suggesting the OP reads this to understand RegEx better? – Noldorin Mar 03 '09 at 13:46
  • @Noldorin - yes, it's a nice book, very user friendly. Once he grasps the basics he'll have a easier time on. – Rook Mar 03 '09 at 17:46
  • @Noldorin - "Mastering" is also a nice book, but learning is better for starters (it's not just basics) – Rook Mar 03 '09 at 17:47
1

I quite often use RegexPal for quick checks (an online regular expression prototyper). It has a lot of the common expressions listed along with a simple expression. Very handy when you don't have a dedicated tool and just need a quick way to work out a somple regex.

Jon Cage
  • 36,366
  • 38
  • 137
  • 215
0

Not sure about doing such a thing directly in Python, but I could definitely suggest using a RegEx editor tool. That's likely to be your best bet anyway. Personally, I've used The Regulator and found it to very helpful. Some others are listed in this SO thread.

Community
  • 1
  • 1
Noldorin
  • 144,213
  • 56
  • 264
  • 302
0

Similar to the already mentioned, there is also Regexbuddy

Rook
  • 60,248
  • 49
  • 165
  • 242
0

What RegexBuddy has that the other tools don't have is a built-in debugger that shows you the entire matching process of both successful and failed match attempts. The other tools only show the final result (which RegexBuddy can show too).

Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
0

I'm 100% with Geo re the need for on-board regex debugging in vanilla Python3, as it is in vanilla Perl5.

As good as Python3 is, and at 3.11.4 it is very good indeed, it still isn't as good as Perl5. Close, but no cigar. And unfortunately, the Python community over-reacts to any criticism. The usual answer goes something like: "Why do you want to do it that way? Just do it the Python way." That's not very helpful unless you're a newbie. If you're an experienced Perl programmer you may be studying Python because your boss ordered you to. Or, again, if you're an experienced Perl head, you may be doing it voluntarily just to learn what all the hype is about, or to decide whether it's time to leave Perl behind. (Spoiler alert: it's not.)

use re debug and use re debugcolor are incredibly useful. Why? Because some problems for which a regex is the solution are not trivial. I can think of problems I've solved with a regex that would give Jeffrey Friedl pause. Those two use use cases to the rescue. And not only do they help you debug a regex that is failing to match even when you are convinced it should, they can help you optimize a working regex to make it better, perhaps going from O(N) to O(N-1).

And all this 'stuff' (can't use bad language on stackoverflow) about there's only one right way to do something in Python is nonsense. As long as Python makes it easy ('easy' is a relative word) to extend Python with C, Fortran and, ugh!, C++) Python supports TMTOWTDI too.

One last point: I believe RegexBuddy only runs on Windoze. I'm not going to run a Windoze instance in VirtualBox (I'm a macOS and Linux guy) just to do regex debugging. Better, for me at least, is to develop a complex regex in Perl5 and then port it to Python3 after it is proven to work, obviously making use of those two use cases. For that I put the Python3 code after __END__ (ahem, another useful construct missing from Python), comment out the Perl code with a vim block command and shuffle the shebangs to switch programs. I guess that's another example of TMTOWTDI that would be frowned upon.

perlboy
  • 64
  • 7
0

Can someone explain my, why everybody says it is impossible? It is in my opinion very, very simple:


regex_string = r"Hello \w\w name .[a-z] funny\d\d[a-z]3"
text = "Hello my name is funny0123"

full_break = False
for ind_regex in list(range(len(regex_string)))[::-1]:
    try:
        regex = re.compile(regex_string[:ind_regex])
    except re.error:
        # no valid regex
        continue
    for ind_text in list(range(len(text)))[::-1]:
        if regex.match(text[:ind_text]):
            print(regex_string[:ind_regex])
            print(text[:ind_text])
            full_break = True
            break
    if full_break:
        break

of course it doesn't work well with groups but it's better than nothing

GuiTaek
  • 398
  • 2
  • 12