Python - how to delete hidden signs from string?

Question

Sometimes I have a strings with strange characters. They are not visible in browser, but are part of the string and are counted in len(). How can I get rid of it? Strip() deletes normal space but not that signs.

See this solution: http://stackoverflow.com/questions/92438/stripping-non-printable-characters-from-a-string-in-python — JJ., Aug 22 '11 at 12:28

score 17 · Answer 1 · answered Aug 22 '11 at 12:27

Use the character categories from the string module. If you want to allow all printable characters, you can do

from string import printable
new_string = ''.join(char for char in the_string if char in printable)

Building on YOU's answer, you can do this with re.sub too:

new_string = re.sub("[^{}]+".format(printable), "", the_string)

Also, if you want to see all the characters in a string, even the unprintable ones, you can always do

print repr(the_string)

which will show things like \x00 for unprintable characters.

score 15 · Answer 2 · answered May 03 '18 at 10:01

15

You can filter your string using str.isprintable() (from PEP-3138):

output_str = ''.join(c for c in input_str if c.isprintable())

answered May 03 '18 at 10:01

Mikhail Shcheglov

151
2
4

Very easy to implement even now some years later. If you want to collect those invisible elements within the string, so, you just: `[c for c in _x if not c.isprintable()]`. In my case, I get only the invisible ones, so, you can do some "hot in code" and do whatever you want. – Bitart Dec 09 '21 at 17:56

score 6 · Answer 3 · answered Aug 22 '11 at 12:26

6

Collect set of chars that you want to enable and remove the rest like this

import re
text = re.sub("[^a-z0-9]+","", text, flags=re.IGNORECASE)

it will remove any characters other than a to z, A to Z and 0 to 9.

answered Aug 22 '11 at 12:26

YOU

120,166
34
186
219

I need full utf8 signs set :/ – robos85 Aug 22 '11 at 12:35
@robos85, you need some info to strip or not to strip. so can I assume you need to strip all invalid chars for utf8? there is a solution for that, but which might includes unvisible/non-printable characters. – YOU Aug 22 '11 at 12:45

rocksportrocker · Answer 4 · 2011-08-22T13:10:57.513

Regular expressions are a good and very universal tool for all kinds of string analysis. If speed is an issue, the "translate" method from the string class can help you too.

First you define a ('identity') mapping, which will not change anything:

mapping = map(chr, range(256))

if you want to replace each "a" by a "b", you modify your mapping

mapping[ord('a')] = 'b'

Now you build the table for the "translate" method:

table = "".join(mapping)

and

print "abc".translate(table)

prints "bbc".

If you really want to delete the "a", you do not modify the mapping above, build the table and then call translate as follows:

print "abc".translate(table, "a")

gives you "bc".

Once the table is built, the translate method is very fast.

So in your case you can modify the mapping such that all your unwanted characters are mapped to a whitespace

mapping = map(chr, range(256))
table = "".join( " " if c in unwanted_chars else c for c in map(chr, range(256)) )

and use len("my string".translate(table).trim()) which ignores the unwanted characters at the beginning and the end of the string.

Or you use len("my string".translate(table, unwanted_chars)) which will ignore all you unwanted characters.

Nice. +1 tomorrow when I have votes again. I thought about translate but was too lazy to look up the syntax. — agf, Aug 22 '11 at 15:02

Python - how to delete hidden signs from string?

4 Answers4

Linked