-1

I am trying to remove all punctuation marks from a string except (.) and (:). This is what I have implemented:

import string
import re 
remove = string.punctuation
remove = remove.replace(".", "")
pattern = r"[{}]".format(remove) 
line = "NETWORK  [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
re.sub(pattern, "", line) 

Current output: NETWORK listener connection accepted from 12700159926 4785 3 connections now open

Desired output: NETWORK listener connection accepted from 127.0.0.1:59926 4785 3 connections now open

What am I doing wrong? Thanks for the help!

Saranya Gupta
  • 1,945
  • 2
  • 10
  • 14
  • You may use `pattern = r"[{}]".format(re.escape(remove))` or `pattern = r"[{}]".format(remove.replace("\\",r"\\").replace("^",r"\^").replace("]",r"\]").replace("-",r"\-"))`. – Wiktor Stribiżew Jul 28 '20 at 08:13
  • 1
    **Duplicate of [Escaping regex string](https://stackoverflow.com/questions/280435/escaping-regex-string)** and **[Python Password Validation: Unable to use constants from string library in regex](https://stackoverflow.com/questions/59140483)**, etc. – Wiktor Stribiżew Jul 28 '20 at 08:24

3 Answers3

3

Apart from the fact you don't remove the : from the pattern, the pattern you end up with is:

[!"#$%&'()*+,-/:;<=>?@[\]^_`{|}~]
            ^^^

Note that ,-/ bit. In a regex, that means all characters between , and / inclusive, including - and ..

You would possibly be better of constructing it manually so as to avoid any tricky escaping requirements other than what you need, something like (untested so I'm not sure if more escaping is required):

pattern = "[!\"#$%&'()*+,\-/:;<=>?@[\]^_`{|}~]"

Alternatively, I'd probably rather allow a specific set of characters to survive rather than specifying a set to remove (the regex will be a lot simpler):

re.sub("[^a-zA-Z :\.]", "", line)

This will only allow alphanumerics, spaces, the colon and the period - everything else will be stripped.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
1

This should work for you:

import string
import re 
remove = string.punctuation
remove = re.sub(r"[.:-]+", "", remove)
pattern = r"[{}]".format(remove + '-') 
line = "NETWORK  [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
re.sub(pattern, "", line) 

Output:

NETWORK  listener connection accepted from 127.0.0.1:59926 4785 3 connections now open

Details:

  • For remove = re.sub(r"[.:-]+", "", remove): In character class adding : and - for removal since an unescaped hyphen in middle of a character class acts as range rather than literal -
  • For r"[{}]".format(remove + '-') we add - in character class in the end, note that unescaped hyphen at the end of [...] is fine
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

you don't escape special characters in string.punctuation for your regex. also you forgot to replace :!

use re.escape to escape regex special characters in punctuation. your final pattern will be [\!\"\#\$\%\&\'\(\)\*\+\,\-\/\;\<\=\>\?\@\[\\\]\^_\`\{\|\}\~]

import string
import re 
remove = string.punctuation

remove = remove.replace(".", "")
remove = remove.replace(":", "")

pattern = r"[{}]".format(re.escape(remove))

line = "NETWORK  [listener] connection accepted from 127.0.0.1:59926 #4785 (3 connections now open)"
line = re.sub(pattern, "", line)

output:

NETWORK  listener connection accepted from 127.0.0.1:59926 4785 3 connections now open
mjrezaee
  • 1,100
  • 5
  • 9