Sometimes I have a strings with strange characters. They are not visible in browser, but are part of the string and are counted in len(). How can I get rid of it? Strip() deletes normal space but not that signs.
4 Answers
Use the character categories from the string
module. If you want to allow all printable characters, you can do
from string import printable
new_string = ''.join(char for char in the_string if char in printable)
Building on YOU's answer, you can do this with re.sub
too:
new_string = re.sub("[^{}]+".format(printable), "", the_string)
Also, if you want to see all the characters in a string, even the unprintable ones, you can always do
print repr(the_string)
which will show things like \x00
for unprintable characters.

- 171,228
- 44
- 289
- 238
You can filter your string using str.isprintable() (from PEP-3138):
output_str = ''.join(c for c in input_str if c.isprintable())

- 151
- 2
- 4
-
Very easy to implement even now some years later. If you want to collect those invisible elements within the string, so, you just: `[c for c in _x if not c.isprintable()]`. In my case, I get only the invisible ones, so, you can do some "hot in code" and do whatever you want. – Bitart Dec 09 '21 at 17:56
Collect set of chars that you want to enable and remove the rest like this
import re
text = re.sub("[^a-z0-9]+","", text, flags=re.IGNORECASE)
it will remove any characters other than a to z, A to Z and 0 to 9.

- 120,166
- 34
- 186
- 219
-
-
@robos85, you need some info to strip or not to strip. so can I assume you need to strip all invalid chars for utf8? there is a solution for that, but which might includes unvisible/non-printable characters. – YOU Aug 22 '11 at 12:45
Regular expressions are a good and very universal tool for all kinds of string analysis. If speed is an issue, the "translate" method from the string class can help you too.
First you define a ('identity') mapping, which will not change anything:
mapping = map(chr, range(256))
if you want to replace each "a" by a "b", you modify your mapping
mapping[ord('a')] = 'b'
Now you build the table for the "translate" method:
table = "".join(mapping)
and
print "abc".translate(table)
prints "bbc".
If you really want to delete the "a", you do not modify the mapping above, build the table and then call translate as follows:
print "abc".translate(table, "a")
gives you "bc".
Once the table is built, the translate method is very fast.
So in your case you can modify the mapping such that all your unwanted characters are mapped to a whitespace
mapping = map(chr, range(256))
table = "".join( " " if c in unwanted_chars else c for c in map(chr, range(256)) )
and use len("my string".translate(table).trim())
which ignores the unwanted characters
at the beginning and the end of the string.
Or you use len("my string".translate(table, unwanted_chars))
which will ignore all you unwanted characters.

- 7,251
- 2
- 31
- 48
-
Nice. +1 tomorrow when I have votes again. I thought about translate but was too lazy to look up the syntax. – agf Aug 22 '11 at 15:02