1

This problem might be very simple but I find it a bit confusing & that is why I need help.

With relevance to this question I posted that got solved, I got a new issue that I just noticed.


Source code:

from PyQt5 import QtCore,QtWidgets

app=QtWidgets.QApplication([])

def scroll():
    #QtCore.QRegularExpression(r'\b'+'cat'+'\b')
    item = listWidget.findItems(r'\bcat\b', QtCore.Qt.MatchRegularExpression)
    for d in item:
        print(d.text())

window = QtWidgets.QDialog()
window.setLayout(QtWidgets.QVBoxLayout())
listWidget = QtWidgets.QListWidget()
window.layout().addWidget(listWidget)


cats = ["love my cat","catirization","cat in the clouds","catść"]

for i,cat in enumerate(cats):
    QtWidgets.QListWidgetItem(f"{i}  {cat}", listWidget)

btn = QtWidgets.QPushButton('Scroll')
btn.clicked.connect(scroll)
window.layout().addWidget(btn)
window.show()
app.exec_()

Output GUI:

PYQT5 OUTPUT GUI


Now as you can see I am just trying to print out the text data based on the regex r"\bcat\b" when I press the "Scroll" button and it works fine!

Output:

0  love my cat
2  cat in the clouds
3  catść

However... as you can see on the #3, it should not be printed out cause it obviously does not match with the mentioned regular expression which is r"\bcat\b". However it does & I am thinking it has something to do with that special foreign character ść that makes it a match & prints it out (which it shouldn't right?).

I'm expecting an output like:

0  love my cat
2  cat in the clouds

Researches I have tried

I found this question and it says something about this \p{L} & based on the answer it means:

If all you want to match is letters (including "international" letters) you can use \p{L}.

To be honest I'm not so sure how to apply that with PyQT5 also still I've made some tries & and I tried changing the regex to like this r'\b'+r'\p{cat}'+r'\b'. However I got this error.

QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object

Obviously the error says it's not a valid regex. Can someone educate me on how to solve this issue? Thank you!

Ice Bear
  • 2,676
  • 1
  • 8
  • 24
  • 1
    You should pass the `QRegularExpression::UseUnicodePropertiesOption` option. Or, use `r'(*UCP)\bcat\b'` – Wiktor Stribiżew Jan 06 '21 at 17:09
  • oh my! Amazing! @WiktorStribiżew it's so unbelievable how people here can solve an issue in just seconds while I'm here figuring it out for hours... this regex you provided worked! but may I ask what does `UCP` stand for? Also I'd love to learn how to do this that you mentioned **pass the QRegularExpression::UseUnicodePropertiesOption option**. Do you happen to know how to do it with `PYQT5?` Thank you – Ice Bear Jan 06 '21 at 17:13
  • I'd really love to learn how to do it since it's also a good option to do as a solution to this issue. If yes you can kindly post it as an answer along with the regex you provided. – Ice Bear Jan 06 '21 at 17:14
  • 1) `r'\b'+r'\p{cat}'+r'\b'` is **exactly** the same as `r'\b\p{cat}\b'`; do some research on string concatenation, as it's directly related to the previous problem you had (for which you did *not* add the prefix instead). 2) Don't ask "what does UCP" stand for, look up for it (it's pretty clear that it stands for the same thing as the `UseUnicodePropertiesOption`, as UCP is its acronym). – musicamante Jan 06 '21 at 17:20
  • 1) I know that, I just wanted the first one cause I'm using my text editor and the color differs from the other one, I just like it, **my eyes likes it**. 2.) I got hyped by @Wiktor and googled UCP but found a different results, now I got it. – Ice Bear Jan 06 '21 at 17:22
  • What if you specify a QRegularExpression and pass it to the findItems method? `rx = Qt.QRegularExpression(r'\bcat\b', Qt.UseUnicodePropertiesOption)` and then use `item = listWidget.findItems(rx, Qt.MatchRegularExpression)` – Wiktor Stribiżew Jan 06 '21 at 17:22
  • Ah, I see, that probably should be `rx = QtCore.QRegularExpression(r'\bcat\b', QtCore.QRegularExpression.UseUnicodePropertiesOption)` – Wiktor Stribiżew Jan 06 '21 at 17:31
  • I tried this one `rx = QtCore.Qt.QRegularExpression(r'\bcat\b', QtCore.Qt.UseUnicodePropertiesOption)` I just based it on my imports. `item = listWidget.findItems(rx, QtCore.Qt.MatchRegularExpression)` but I got this error: `AttributeError: type object 'Qt' has no attribute 'QRegularExpression'` – Ice Bear Jan 06 '21 at 17:32
  • I think it's `from PyQt5.QtCore import QRegularExpression` so it would be `QtCore.QRegularExpression`. I'll try – Ice Bear Jan 06 '21 at 17:35
  • I got this error tho :/ `TypeError: findItems(self, str, Union[Qt.MatchFlags, Qt.MatchFlag]): argument 1 has unexpected type 'QRegularExpression'` but I can see where this is going and the implementation of this `rx = QtCore.QRegularExpression(r'\bcat\b', QtCore.QRegularExpression.UseUnicodePropertiesOption)` is **CORRECT**. It's just the `findItems` function first param expecting a string `str`. – Ice Bear Jan 06 '21 at 17:39
  • 1
    This means, you might have to only rely on the `(*UCP)` PCRE verb if all you can do is use a *string* regex pattern (and not a regex object). – Wiktor Stribiżew Jan 06 '21 at 17:42
  • hmmm... I see I guess that's fine! Thanks a lot! I think you can post the answer please cause I do sometimes make this (my questions) as notes. Also other people might find it useful! Thanks a lot! – Ice Bear Jan 06 '21 at 17:44

1 Answers1

1

In general, when you need to make your shorthand character classes and word boundaries Unicode-aware, you need to pass the QRegularExpression.UseUnicodePropertiesOption option to the regex compiler. See the QRegularExpression.UseUnicodePropertiesOption reference:

The meaning of the \w, \d, etc., character classes, as well as the meaning of their counterparts (\W, \D, etc.), is changed from matching ASCII characters only to matching any character with the corresponding Unicode property. For instance, \d is changed to match any character with the Unicode Nd (decimal digit) property; \w to match any character with either the Unicode L (letter) or N (digit) property, plus underscore, and so on. This option corresponds to the /u modifier in Perl regular expressions.

In Python, you could declare it as

rx = QtCore.QRegularExpression(r'\bcat\b', QtCore.QRegularExpression.UseUnicodePropertiesOption)

However, since the QListWidget.findItems does not support a QRegularExpression as argument and only allows the regex as a string object, you can only use the (*UCP) PCRE verb as an alternative:

r'(*UCP)\bcat\b'

Make sure you define it at the regex beginning.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563