strip or remove all special characters from list of strings in python

Question

I have list of strings and I have to remove all special characters (, - ' " .).

My code is

import glob
import re

files = []
for text in glob.glob("*.txt.txt"):
 with open(text) as f:
    fileRead = [ line.lower() for line in f]
 files.append(fileRead)

files1 = []

for item in files :
 files1.append(''.join(item))

I have used lot of options including "replace", "strip" and "re".

when I use strip (shown below), the code runs but no changes are seen in output.

files1 = [line.strip("'") for line in files1]

When I use re, I get TypeError: expected string or bytes-like object. I changed to list of strings from list of lists so that I can use re. This method is stated many times but did not solve the problem for me.

files1 = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", files1)

I am not able to use replace as it throws an attribute error that replace cannot be used on lists.

Please suggest me how can I get rid of all special characters.

`files1` is a list, not string. You need to pass a string to `re.sub`. So try element-wise. — Kota Mori, Sep 11 '18 at 16:16
@KotaMori I have tried that too - is there anything in this? files1 = [re.sub('[-()\"#/@;:<>{}`+=~|.!?,]', '', files1) for y in files1] — AST, Sep 11 '18 at 16:20
Pass `y` not `files`? If you still get error, provide the result of `type(files1[0])` — Kota Mori, Sep 11 '18 at 16:22
Possible duplicate of [Remove all special characters, punctuation and spaces from string](https://stackoverflow.com/questions/5843518/remove-all-special-characters-punctuation-and-spaces-from-string) — Raman Mishra, Sep 11 '18 at 16:40

score 4 · Answer 1 · answered Sep 11 '18 at 16:28

You should apply the re.sub function on single objects, not on lists.

files_cleaned = [re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", file) for file in files]

If you only want to accept alphanumerical characters you can do this instead:

files_cleaned = [re.sub(r"[^a-zA-Z0-9]", "", file) for file in files]

Chandella07 · Answer 2 · 2018-09-11T16:45:31.720

try below example:

files = ["Hello%","&*hhf","ddh","GTD@JJ"]    #input data in list

# going through each element of list
# apllying a filter on each character of string for alphabet or numeric other then special symbol
# joining the charactors back again and putting them in list
result = ["".join(list(filter(str.isalnum, line))) for line in files]

print(result)    #print the result

Output:

['Hello', 'hhf', 'ddh', 'GTDJJ']

score 0 · Answer 3 · answered Sep 11 '18 at 16:42

0

You can use str.isalnum

will return True if all the character in the str are Alpha numeric.

answered Sep 11 '18 at 16:42

Raman Mishra

2,635
2
15
32

strip or remove all special characters from list of strings in python

3 Answers3