Remove ASCII control characters from text file Python

Question

I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the doubles/ints from a line, I am getting erros like "invalid literals \x10". Shown below are the first 2 lines of my file.

DLE NUL NUL NUL [1, 167, 133, 6]DLE NUL NUL   
YS FS NUL[0.0, 4.3025989e-07, 1.5446712e-06, 3.1393029e-06, 5.0430463e-06, 7.1382601e-06

How do I remove all these control characters from a text file at once, using Python? I want this to be done before I parse the file into numbers ...

Any help is appreciated!

Perhaps you should consider parsing them instead so that you know how to parse the rest of the file. — Ignacio Vazquez-Abrams, Jul 05 '13 at 03:34
However, I still really need to remove these characters before I do any sort of reading with them.... — atmaere, Jul 05 '13 at 03:40

score 3 · Accepted Answer · answered Jul 05 '13 at 03:39

3

Use string.printable.

>>> import string
>>> filter(string.printable.__contains__, '\x00\x01XYZ\x00\x10')
'XYZ'

answered Jul 05 '13 at 03:39

falsetru

357,413
63
732
636

Using regex (see [this answer](http://stackoverflow.com/a/93029/1988505)) is an order of magnitude faster. – Wesley Baugh Nov 07 '14 at 20:31
@WesleyBaugh, If speed matters, you can use [`str.translate`](https://docs.python.org/2/library/stdtypes.html#str.translate). – falsetru Nov 08 '14 at 00:21
@alvas, How about using `unicode(string.printable)` if you want to use exactly same characters? – falsetru Mar 18 '15 at 12:21

score 2 · Answer 2 · answered Apr 20 '17 at 13:54

I know it is very old post, but I am answering as I think, it could help others.

I did as follows. It will replace all ASCII control characters by an empty string.

line = re.sub(r'[\x00-\x1F]+', '', line)

Ref: ASCII (American Standard Code for Information Interchange) Code

Ref: Python re.sub()

Remove ASCII control characters from text file Python

2 Answers2