1

For example, I have a string:

sentence = ['cracked $300 million','she\'s resolutely, smitten ', 'that\'s creative [r]', 'the market ( knowledge check : prices up!']

I want to remove the punctuation and replace numbers with the '£' symbol. I have tried this but can only replace one or the other when I try to run them both. my code is below

import re
s =([re.sub(r'[!":$()[]\',]',' ', word) for word in sentence]) 

s= [([re.sub(r'\d+','£', word) for word in s])]
s)

I think the problem could be in the square brackets?? thank you!

Bluetail
  • 1,093
  • 2
  • 13
  • 27
  • [These answers](https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string) might also be relevant. – Anderson Green Mar 31 '22 at 14:28
  • yes, I have fixed the regex yet the problem is about how to combine these two list comprehensions. – Bluetail Mar 31 '22 at 15:17

3 Answers3

2

Sorry i didn't see the second part of your request but you can to this for the number and the punctuation

sentence = ['cracked $300 million', 'she\'s resolutely, smitten ', 'that\'s creative [r]',
            'the market ( knowledge check : prices up!']
def replaceDigitAndPunctuation(newSentence):
    new_word = ""
    for char in newSentence:
        if char in string.digits:
            new_word += "£"
        elif char in string.punctuation:
            pass
        else:
            new_word += char
    return new_word


for i in range(len(sentence)):
    sentence[i] = replaceAllDigitInString(sentence[i])
Kal-1
  • 177
  • 1
  • 9
2

If you want to replace some specific punctuation symbols with a space and any digit chunks with a £ sign, you can use

import re
rx = re.compile(r'''[][!":$()',]|(\d+)''')
sentence = ['cracked $300 million','she\'s resolutely, smitten ', 'that\'s creative [r]', 'the market ( knowledge check : prices up!']
s = [rx.sub(lambda x: '£' if x.group(1) else ' ', word) for word in sentence] 
print(s) # => ['cracked  £ million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']

See the Python demo.

Note where [] are inside a character class: when ] is at the start, it does not need to be escaped and [ does not have to be escaped at all inside character classes. I also used a triple-quoted string literal, so you can use " and ' as is without extra escaping.

So, here, [][!":$()',]|(\d+) matches ], [, !, ", :, $, (, ), ' or , or matches and captures into Group 1 one or more digits. If Group 1 matched, the replacement is the euro sign, else, it is a space.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • do you also know how to remove the " 's " from the word endings? for example, if I have another string containing "character's", how can I remove 's so it is "character"? – Bluetail Apr 01 '22 at 19:11
  • @Bluetail See https://ideone.com/4XDxDQ, `rx = re.compile(r"""\b's\b|[][!":$()',]|(\d+)""")` might help. – Wiktor Stribiżew Apr 01 '22 at 19:17
1

Using your input and pattern:

>>> ([re.sub(r'[!":$()[]\',]',' ', word) for word in sentence]) 
['cracked $300 million', "she's resolutely, smitten ", "that's creative [r]", 'the market ( knowledge check : prices up!']
>>> 

The reason is because [!":$()[] is being treated as a character group, and \',] is a literal pattern, i.e. the engine is looking for ',] exactly.

With the closing bracket in the group escaped:

\]

>>> ([re.sub(r'[!":$()[\]\',]',' ', word) for word in sentence]) 
['cracked  300 million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']
>>> 

Edit: If you're trying to stack multiple actions into a single list comprehension, then place your actions in a function and call the function:

def process_word(word):
  word = re.sub(r'[!":$()[\]\',]',' ', word)
  word = re.sub(r'\d+','£', word)
  return word

Results in:

>>> [process_word(word) for word in sentence]
['cracked  £ million', 'she s resolutely  smitten ', 'that s creative  r ', 'the market   knowledge check   prices up ']
  • yes, thank you! I have also tried to remove " 's " at the end of the words like in "that's" should become "that", but it did not quite work with the escape "\s' " symbol. – Bluetail Apr 01 '22 at 19:04
  • it gives an error, "TypeError: expected string or bytes-like object" when I try process_word(sentence). – Bluetail Apr 01 '22 at 19:13
  • if `sentence` is a list of strings as in your example above, then you'll need to do the list comprehension to run the function on each string within the list. –  Apr 01 '22 at 19:19