33

I have some code that pulls data from a com-port and I want to make sure that what I got really is a printable string (i.e. ASCII, maybe UTF-8) before printing it. Is there a function for doing this? The first half dozen places I looked, didn't have anything that looks like what I want. (string has printable but I didn't see anything (there, or in the string methods) to check if every char in one string is in another.

I am looking for a single function, not a roll-your-own solution.

Note: control characters are not printable for my purposes.

Neuron
  • 5,141
  • 5
  • 38
  • 59
BCS
  • 75,627
  • 68
  • 187
  • 294
  • If there's no ready-made solution, you can DIY with `string.printable`: `printables = set(string.printable); if all(char in printables for char in your_string): ...` –  Sep 03 '10 at 15:07

10 Answers10

53

As you've said the string module has printable so it's just a case of checking if all the characters in your string are in printable:

>>> hello = 'Hello World!'
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False

You could convert both strings to sets - so the set would contain each character in the string once - and check if the set created by your string is a subset of the printable characters:

>>> printset = set(string.printable)
>>> helloset = set(hello)
>>> bellset = set(bell)
>>> helloset
set(['!', ' ', 'e', 'd', 'H', 'l', 'o', 'r', 'W'])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False

So, in summary, you would probably want to do this:

import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)
rvb
  • 420
  • 1
  • 8
  • 15
David Webb
  • 190,537
  • 57
  • 313
  • 299
  • 4
    I was kinda hoping for a non-roll your own solution. Why the heck doesn't python have this as a function? – BCS Sep 03 '10 at 16:58
  • 8
    "Why the heck doesn't python have this as a function?": this solution, and others like it, are trivial compositions of builtin python facilities. if this was given a special name, and every other useful but trivial feature was also blessed with a name, then the python namespace would be abysmally cluttered. this short composition is every bit as readable as some hypothetical `stringutil.stringisprintable(myvar)`, except that there's no need to maintain that extra module. – SingleNegationElimination Jul 29 '11 at 15:50
  • 5
    Does this handle anything beyond ASCII? – jpmc26 Jan 27 '15 at 02:24
  • 4
    Well, Python does have isalpha, isdigit, isspace, isalnum, islower, isupper and istitle. The one's it's missing (compared to C) are iscntrl, isgraph, isprint, ispunct and isxdigit. Given the C library implements them already, it's not entirely strange to assume Python would have them too. – kleptog Apr 15 '16 at 08:48
  • 1
    since this post is old, python 2 does not have a `str.isprint` or `str.isprintable` builtin method or function. python 3 does. ...it's a minor annoyance that instead of following convention and style, they called it `isprintable` instead of `isprint`. py2 -> https://docs.python.org/2.7/library/stdtypes.html#str.isalnum ; py3 -> https://docs.python.org/3.6/library/stdtypes.html#str.isalnum –  Aug 22 '17 at 21:38
  • 1
    This is NOT a good solution in general because `string.printable` will not recognize accents or diacritics for instance é and à in "Déjà vu" are not included, so limited solution. – alemol Jul 16 '21 at 16:24
8

try/except seems the best way:

def isprintable(s, codec='utf8'):
    try: s.decode(codec)
    except UnicodeDecodeError: return False
    else: return True

I would not rely on string.printable, which might deem "non-printable" control characters that can commonly be "printed" for terminal control purposes (e.g., in "colorization" ANSI escape sequences, if your terminal is ANSI-compliant). But that, of course, depends on your exact purposes for wanting to check this!-)

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 2
    string.printable is well defined. "a combination of digits, letters, punctuation, and whitespace." Whitesapce OTOH is a little less so: "On most systems this includes the characters space, tab, linefeed, return, formfeed, and vertical tab." – BCS Sep 03 '10 at 15:36
  • 2
    @BCS, it's basically the same concept as C's bad old `isprint` macro, and exhibits exactly the same failings (no control sequences / escape sequences -- but many terminals and printers can accept some control / escape sequences for cosmetic purposes such as colorization, and, depending on the app's purposes, forbidding such characters from the output may therefore prove unwise). – Alex Martelli Sep 03 '10 at 15:54
  • My concern is that whitespace could include *more* than those 6 chars. I know that if my data source ever contains "control chars", that I can assume they are junk. – BCS Sep 03 '10 at 17:01
  • 4
    Alex, your suggested function fails for even trivial unprintable input; for example: `isprintable('\00\01\02\03')` → `True` — unless I am misunderstanding your intent? – Brandon Rhodes May 11 '11 at 13:12
  • 1
    Alex's function might mean "submittable to print() and to other unspecified streams (like the console and many print devices) without raising an exception" whereas string.printable() loosely means "has a glyph". See Unicode category. The streams you submit a string.printable() char to must agree with your definition. For example, a browser displaying SVG text may raise an exception over not printable() characters (in the Unicode category 'control'.) That's what Alex means by "exact purposes", its about printable()'s ensure assertion and down stream require assertion. – bootchk Mar 25 '14 at 09:20
6

This Python 3 string contains all kinds of special characters:

s = 'abcd\x65\x66 äüöë\xf1 \u00a0\u00a1\u00a2 漢字 \a\b\r\t\n\v\\ \231\x9a \u2640\u2642\uffff'

If you try to show it in the console (or use repr), it makes a pretty good job of escaping all non-printable characters from that string:

>>> s
'abcdef äüöëñ \xa0¡¢ 漢字 \x07\x08\r\t\n\x0b\\ \x99\x9a ♀♂\uffff'

It is smart enough to recognise e.g. horizontal tab (\t) as printable, but vertical tab (\v) as not printable (shows up as \x0b rather than \v).

Every other non printable character also shows up as either \xNN or \uNNNN in the repr. Therefore, we can use that as the test:

def is_printable(s):
    return not any(repr(ch).startswith("'\\x") or repr(ch).startswith("'\\u") for ch in s)

There may be some borderline characters, for example non-breaking white space (\xa0) is treated as non-printable here. Maybe it shouldn't be, but those special ones could then be hard-coded.


P.S.

You could do this to extract only printable characters from a string:

>>> ''.join(ch for ch in s if is_printable(ch))
'abcdef äüöëñ ¡¢ 漢字 \r\t\n\\  ♀♂'
zvone
  • 18,045
  • 3
  • 49
  • 77
6

In Python 3, strings have an isprintable() method:

>>> 'a, '.isprintable()
True

For Python 2.7, see David Webb's answer.

Neuron
  • 5,141
  • 5
  • 38
  • 59
thakis
  • 5,405
  • 1
  • 33
  • 33
  • 2
    Confusingly, `str.isprintable()` has a different notion of "printable" than `string.printable` (for example, the former does not consider `\n` and `\t` to be printable). – jamesdlin Oct 13 '20 at 09:32
  • 1
    This function considers a string using Cyrillic characters as not printable. "Човек" returns false. Totally useless for my needs. – Nicolay77 May 03 '21 at 23:03
4
>>> # Printable
>>> s = 'test'
>>> len(s)+2 == len(repr(s))
True

>>> # Unprintable
>>> s = 'test\x00'
>>> len(s)+2 == len(repr(s))
False
JohnMudd
  • 13,607
  • 2
  • 26
  • 24
1

The category function from the unicodedata module might suit your needs. For instance, you can use this to check whether there are any control characters in a string while still allowing non-ASCII characters.

>>> import unicodedata

>>> def has_control_chars(s):
...     return any(unicodedata.category(c) == 'Cc' for c in s)

>>> has_control_chars('Hello 世界')
False

>>> has_control_chars('Hello \x1f 世界')
True
gatkin
  • 561
  • 5
  • 5
1
# Here is the full routine to display an arbitrary binary string
# Python 2

ctrlchar = "\n\r| "

# ------------------------------------------------------------------------

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False


# ------------------------------------------------------------------------
# Return a hex dump formatted string

def hexdump(strx, llen = 16):
    lenx = len(strx)
    outx = ""
    for aa in range(lenx/16):
        outx += " "
        for bb in range(16):
            outx += "%02x " % ord(strx[aa * 16 + bb])
        outx += " | "     
        for cc in range(16):
            chh = strx[aa * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " | \n"

    # Print remainder on last line
    remn = lenx % 16 ;   divi = lenx / 16
    if remn:
        outx += " "
        for dd in range(remn):
            outx += "%02x " % ord(strx[divi * 16 + dd])
        outx += " " * ((16 - remn) * 3) 
        outx += " | "     
        for cc in range(remn):
            chh = strx[divi * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " " * ((16 - remn)) 
        outx += " | \n"


    return(outx)
Peter Glen
  • 11
  • 1
1

In the ASCII table, [\x20-\x7e] are printable characters.
Use regular expressions to check whether characters other than these characters are included in the string.
You can make sure whether this is a printable string.

>>> import re

>>> # Printable
>>> print re.search(r'[^\x20-\x7e]', 'test')
None

>>> # Unprintable
>>> re.search(r'[^\x20-\x7e]', 'test\x00') != None
True

>>> # Optional expression
>>> pattern = r'[^\t-\r\x20-\x7e]'
Neuron
  • 5,141
  • 5
  • 38
  • 59
yunqimg
  • 11
  • 3
  • This would be a better answer if you explained how the code you provided answers the question. – pppery May 02 '20 at 23:35
0

Mine is a solution to get rid of any known set of characters. it might help.

non_printable_chars = set("\n\t\r ")     # Space included intensionally
is_printable = lambda string:bool(set(string) - set(non_printable_chars))
...
...
if is_printable(string):
    print("""do something""")

...

jerinisready
  • 936
  • 10
  • 24
0
ctrlchar = "\n\r| "

# ------------------------------------------------------------------------
# This will let you control what you deem 'printable'
# Clean enough to display any binary 

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False
Peter Glen
  • 11
  • 1