14

How can I search for, say, a sequence of 10 isprint characters in a given string in Python?

With GNU grep, I would simply do grep [[:print:]]{10}

nodakai
  • 7,773
  • 3
  • 30
  • 60

1 Answers1

12

Since POSIX is not supported by Python re module, you have to emulate it with the help of character class.

You can use the one from the regular-expressions.info and add a limiting quantifier {10}:

[\x20-\x7E]{10}

See demo

Alternatively, you can use Matthew Barnett regex module that claims to support POSIX character classes (POSIX character classes are supported.).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This character class worked for me in Python 3 `[\`~!@#$%^&*()_=+\[\]{}\\\|;:\"\'<>.,/?]` when using inside the `re.sub()` method – Allan Tsai Feb 02 '20 at 13:20
  • 1
    @Iota, that ``[`~!@#$%^&*()_=+\[\]{}\\\|;:\"\'<>.,/?]`` only matches ASCII punctuation, it has nothing to do with the concept of "printable chars". So, if you were to use a POSIX character class, it would be `[[:punct:]]`. To match punctuation in Python, you can use `[^\w\s]`, although there are better and more precise patterns. – Wiktor Stribiżew Feb 02 '20 at 13:25
  • 1
    My mistake! I misread the `[[:print]]` class as `[[:punct]]`. Appreciate your correction. – Allan Tsai Feb 04 '20 at 06:19
  • The regex in the answer will not match Unicode non-ASCII characters like grep (GNU grep) does. – pabouk - Ukraine stay strong Apr 05 '22 at 13:41
  • 2
    @pabouk-Ukrainestaystrong Then see the bottom of the answer. Just install the PyPi regex module (`pip install regex` in the console/terminal) and then use `import regex` and `pattern = regex.compile(r'[[:print:]]{10}')`. – Wiktor Stribiżew Apr 05 '22 at 13:49