3

I am trying to find a way of excluding the words that contain my regular expression, but are not my regular expression using the search method of a Text widget object. For example, suppose I have this regular expression "(if)|(def)", and words like define, definition or elif are all found by the re.search function, but I want a regular expression that finds exactly just if and def.

This is the code I am using:

import keyword

PY_KEYS = keyword.kwlist
PY_PATTERN = "^(" + ")|(".join(PY_KEYS) + ")$"

But it is still taking me words like define, but I want just words like def, even if define contains def.

I need this to highlight words in a tkinter.Text widget. The function I am using which is responsible for highlight the code is:

def highlight(self, event, pattern='', tag=KW, start=1.0, end="end", regexp=True):
    """Apply the given tag to all text that matches the given pattern
    If 'regexp' is set to True, pattern will be treated as a regular
    expression.
    """

    if not isinstance(pattern, str) or pattern == '':
        pattern = self.syntax_pattern # PY_PATTERN
    # print(pattern)

    start = self.index(start)
    end = self.index(end)

    self.mark_set("matchStart", start)
    self.mark_set("matchEnd", start)
    self.mark_set("searchLimit", end)

    count = tkinter.IntVar()
    while pattern != '':
        index = self.search(pattern, "matchEnd", "searchLimit", 
                            count=count, regexp=regexp)
        # prints nothing
        print(self.search(pattern, "matchEnd", "searchLimit", 
                         count=count, regexp=regexp))
        if index == "":
            break
        self.mark_set("matchStart", index)
        self.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
        self.tag_add(tag, "matchStart", "matchEnd")

On the other hand, if PY_PATTERN = "\\b(" + "|".join(PY_KEYS) + ")\\b", then it highlights nothing, and you can see, if you put a print inside the function, that it's an empty string.

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
nbro
  • 15,395
  • 32
  • 113
  • 196
  • Try printing out py_pattern. –  Dec 20 '14 at 17:59
  • It prints this: `^(False)|(None)|(True)|(and)|(as)|(assert)|(break)|(class)|(continue)|(def)|(del)|(elif)|(else)|(except)|(finally)|(for)|(from)|(global)|(if)|(import)|(in)|(is)|(lambda)|(nonlocal)|(not)|(or)|(pass)|(raise)|(return)|(try)|(while)|(with)|(yield)$`, which is correct, but maybe there problem is caused by something else... – nbro Dec 20 '14 at 18:01
  • 1
    This `^(False)|` anchor `^` applies to only `False`. Do this `PY_PATTERN = "^((" + ")|(".join(PY_KEYS) + "))$"` or do this `PY_PATTERN = "^(" + "|".join(PY_KEYS) + ")$"` –  Dec 20 '14 at 18:02
  • @sln Actually, if I search in a "normal" string `s = "define whatever you want, for example def"`, with this regular expression: `PY_PATTERN = "^(" + ")|(".join(PY_KEYS) + ")$"`, it returns this `<_sre.SRE_Match object; span=(0, 3), match='def'>`, which should NOT be correct. If I use this `PY_PATTERN = "^((" + ")|(".join(PY_KEYS) + "))$"`, it returns `None`. I think the problem is that with the `tags` I am applying to the text of a `tkinter.Text` widget I am using to highlight the text that exactly matches what I am requiring... – nbro Dec 20 '14 at 18:07
  • If you are searching keywords in strings, regex-escape the keywords (if you may have metachars within keywords) and use something like this (if python supports conditionals) `PY_PATTERN = "(?(?=\w)\b|\B)(" + "|".join(PY_KEYS) + ")(?(?<=\w)\b|\B)"` if they keywords are all chars, use this `PY_PATTERN = "\\b(" + "|".join(PY_KEYS) + ")\\b"` –  Dec 20 '14 at 18:17
  • @sln The second one seems to work with a "normal" unlogical string like this: `s = "else if whatever you want if"` (it returns `<_sre.SRE_Match object; span=(25, 27), match='if'>`), but in the text widget (of my tkinter application) no word is highlighted, so the problem is also due to the function that highlights the text? I will try to discover it... – nbro Dec 20 '14 at 18:25
  • @sln Particularly, I am using this code: ` index = self.search(pattern, "matchEnd", "searchLimit", count=count, regexp=regexp), where `pattern` is what you suggested. `search` is a method of a `tkinter.Text` widget http://effbot.org/tkinterbook/text.htm `, and `regexp` is set to `True` . – nbro Dec 20 '14 at 18:37
  • Please include example inputs and desired outputs. Also, the code that performs the match. – OrangeDog Dec 20 '14 at 18:47
  • @OrangeDog Have you seen the discussion? There are a lot of examples already!!! I don't understand why you downvoted... – nbro Dec 20 '14 at 18:48
  • After a second scan of the mass of comments, I still can't see any input examples. Regardless, they're necessary for the question so should be part of the question. – OrangeDog Dec 20 '14 at 18:52
  • @OrangeDog Input examples are for example `define` or `def`. With the last code I am using (from the discussion), no words are highlighted with the `search` method of a `tkinter.Text` widget. What you do you need more? – nbro Dec 20 '14 at 18:54
  • For that to be in the question, along with the code that performs the match. Various of the answers you said don't work should do, so there must be something you haven't revealed. – OrangeDog Dec 20 '14 at 18:58
  • @OrangeDog Ok, edited, if you need more information, just ask. – nbro Dec 20 '14 at 19:03
  • Bad docs for tkintr. Says it uses Tcl regex engine. I would try to hard code a pattern like `define`. If it highlights, try `\bdefine\b`. Work out from there. –  Dec 20 '14 at 19:49
  • @sln Unfortunately, does not return anything, except from an empty string. – nbro Dec 20 '14 at 20:23
  • @nbro - Then that suggests to me its something else, not regex. –  Dec 22 '14 at 18:05

3 Answers3

4

You can use anchors:

"^(?:if|def)$"

^ asserts position at the start of the string, and $ asserts position at the end of the string, asserting that nothing more can be matched unless the string is entirely if or def.

>>> import re

for foo in ["if", "elif", "define", "def", "in"]:
    bar = re.search("^(?:if|def)$", foo)
    print(foo, ' ', bar);

... if   <_sre.SRE_Match object at 0x934daa0>
elif   None
define   None
def   <_sre.SRE_Match object at 0x934daa0>
in   None
Unihedron
  • 10,902
  • 13
  • 62
  • 72
  • Hey dude, your re unfortunately is not working. This is exactly the code I have: `import keyword; PY_KEYS = keyword.kwlist; PY_PATTERN = "^(" + ")|(".join(PY_KEYS) + ")$" ` – nbro Dec 20 '14 at 17:51
  • It still taking words like `define` – nbro Dec 20 '14 at 17:52
3

You could use word boundaries:

"\b(if|def)\b"
OrangeDog
  • 36,653
  • 12
  • 122
  • 207
Simon Farshid
  • 2,636
  • 1
  • 22
  • 31
  • @Unfortunately, it's not working. I am using this code: `import keyword; PY_KEYS = keyword.kwlist; PY_PATTERN = r"\b(" + ")|(".join(PY_KEYS) + ")\\b"`, but is also matching words like `define`, because it contains both `def` and `in` – nbro Dec 20 '14 at 17:43
  • Doesn't it have to be `PY_PATTERN = r"\\b("`...? – Simon Farshid Dec 20 '14 at 17:44
  • Since I have `r`, I should not even need `\\\` – nbro Dec 20 '14 at 17:45
  • Then why do you have "\\b" at the end? – Simon Farshid Dec 20 '14 at 17:45
  • Because otherwise, if I print the regular expression, no `\b` is seen at the end... – nbro Dec 20 '14 at 17:46
  • Oh right, totally missed that. Try putting it all in parenthesis, like `"\b((if)|(def))\b"` – Simon Farshid Dec 20 '14 at 17:48
  • 1
    @nbro: This answer is correct (except for not using a raw string). The code in your first comment is producing `\b(if)|(def)\b`, which is **not** the same as `\b(if|def)\b`. You need to join on `|`, not `)|(`. – Alan Moore Dec 21 '14 at 05:55
  • @AlanMoore With the code I added above in my modified question, this code does not work anyway... – nbro Dec 21 '14 at 12:46
  • The question is in regards to doing a regular expression search in a tkinter text widget. This takes a different flavor of regular expression, where \b is _not_ a word boundary. – Bryan Oakley Feb 18 '15 at 16:23
2

The answers given are ok for Python's regular expression, but I have found in the meantime that the search method of a tkinter Text widget uses actually the Tcl's regular expressions style.

In this case, instead of wrapping the word or the regular expression with \b or \\b (if we are not using a raw string), we can simply use the corresponding Tcl word boundaries character, that is \y or \\y, which did the job in my case.

Watch my other question for more information.

Community
  • 1
  • 1
nbro
  • 15,395
  • 32
  • 113
  • 196