0

I want to remove certain punctuations from a text. I was able to remove my desired characters but it keep leaving a space instead of the character.

In { ) other news tonight,
a Constitutional { | / !! amendment

I have a text such as above and when I process it it becomes

In   other news tonight,
a Constitutional    !! amendment

Instead of

In other news tonight,
a Constitutional !! amendment

Below is the code I have

for line in lines:
    exclude = set('"#$%&\()*+-/:<=>@[\\]^_`{|}')
    line = ''.join(ch for ch in line if ch not in exclude)

How do I remove empty spaces that are being produced?

John Sean
  • 37
  • 6

2 Answers2

1

No empty spaces are being created. Your string already has empty spaces between these characters. Removing those characters will not remove the spaces in between them. One potential solution is that I assume you want to remove any areas with more than one consecutive space. Replace your code with:

exclude = set('"#$%&\()*+-/:<=>@[\\]^_`{|}')
for line in lines:
    line = ''.join(ch for ch in line if ch not in exclude)
    line = ' '.join(line.split())

Which will remove all double spaces.

Theo
  • 613
  • 4
  • 22
  • So one problem with that is that "18-year-old" becomes "18yearold". But I want it to bed "18 year old" – John Sean Apr 02 '20 at 19:51
  • @JohnSean Then remove the dash from your excluded character and maybe make a new list which is characters to be replaced by spaces. Then in between the two joins add line = ' '.join(ch for ch in line if ch not in spacereplace) – Theo Apr 02 '20 at 19:52
0

You can split the string with the str.split method so that multiple spaces are treated as one, and then join the resulting list back into a string by a space:

exclude = set('"#$%&\()*+-/:<=>@[\\]^_`{|}')
for line in lines:
    line = ' '.join(''.join(' ' if ch in exclude else ch for ch in line).split())
blhsing
  • 91,368
  • 6
  • 71
  • 106