0

I have a regex pattern return a list of all the start and stop indices of an occurring string and I want to be able to highlight each occurrence, it's extremely slow with my current setup — using a 133,000 line file it takes about 8 minutes to highlight all occurrences.

Here's my current solution:

if IPv == 4:
    v4FoundUnique = v4FoundUnique + 1
    # highlight all regions found
    for j in range(qty):
        v4Found = v4Found + 1
        # don't highlight if they set the checkbox not to
        if highlightText:
            # get row.column coordinates of start and end of match
            # very slow
            startIndex = textField.index('1.0 + {} chars'.format(starts[j]))
            # compute end based on start, using assumption that IP addresses
            # won't span lines drastically faster than computing from raw index
            endIndex = "{}.{}".format(startIndex.split(".")[0],
                                      int(startIndex.split(".")[1]) + stops[j]-starts[j])
            # apply tag
            textField.tag_add("{}v4".format("public" if isPublic else "private"),
                              startIndex, endIndex)
martineau
  • 119,623
  • 25
  • 170
  • 301
Trevor Hurst
  • 74
  • 10
  • See what's taking so long and work on that. [How do I profile a Python script?](https://stackoverflow.com/questions/582336/how-do-i-profile-a-python-script) You can also use the [`timeit`](https://docs.python.org/3/library/timeit.html#module-timeit) module. – martineau Jul 07 '22 at 16:50
  • @martineau it's marked in my comment, the one that says "# very slow" – Trevor Hurst Jul 07 '22 at 18:19
  • OK. How about a [mre] someone could use for testing and verification? BTW, I mentioned profiling because using regexes can often be rather slow. – martineau Jul 07 '22 at 18:49

1 Answers1

0

So, TKinter has a pretty bad implementation of changing "absolute location" to its row.column format:

startIndex = textField.index('1.0 + {} chars'.format(starts[j]))

it's actually faster to do it like this:

for address in v4check.finditer(filetxt):
    # address.group() returns matching text
    # address.span() returns the indices (start,stop)
    start,stop = address.span()
    ip = address.group()
    srow = filetxt.count("\n",0,start)+1
    scol = start-filetxt.rfind("\n",0,start)-1
    start = "{}.{}".format(srow,scol)
    stop = "{}.{}".format(srow,scol+len(ip))

which takes the regex results and the input file to get the data we need (row.colum)

There could be a faster way of doing this but this is the solution I found that works!

Trevor Hurst
  • 74
  • 10