0

I have list of strings and I have to remove all special characters (, - ' " .).

My code is

import glob
import re

files = []
for text in glob.glob("*.txt.txt"):
 with open(text) as f:
    fileRead = [ line.lower() for line in f]
 files.append(fileRead)

files1 = []

for item in files :
 files1.append(''.join(item))

I have used lot of options including "replace", "strip" and "re".

when I use strip (shown below), the code runs but no changes are seen in output.

files1 = [line.strip("'") for line in files1]

When I use re, I get TypeError: expected string or bytes-like object. I changed to list of strings from list of lists so that I can use re. This method is stated many times but did not solve the problem for me.

files1 = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", files1)

I am not able to use replace as it throws an attribute error that replace cannot be used on lists.

Please suggest me how can I get rid of all special characters.

AST
  • 127
  • 1
  • 2
  • 10

3 Answers3

4

You should apply the re.sub function on single objects, not on lists.

files_cleaned = [re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", file) for file in files]

If you only want to accept alphanumerical characters you can do this instead:

files_cleaned = [re.sub(r"[^a-zA-Z0-9]", "", file) for file in files]
Peter
  • 96
  • 9
0

try below example:

files = ["Hello%","&*hhf","ddh","GTD@JJ"]    #input data in list

# going through each element of list
# apllying a filter on each character of string for alphabet or numeric other then special symbol
# joining the charactors back again and putting them in list
result = ["".join(list(filter(str.isalnum, line))) for line in files]

print(result)    #print the result

Output:

['Hello', 'hhf', 'ddh', 'GTDJJ']
Chandella07
  • 2,089
  • 14
  • 22
0

You can use str.isalnum

will return True if all the character in the str are Alpha numeric.

Raman Mishra
  • 2,635
  • 2
  • 15
  • 32