2

I found this thread: Best way to strip punctuation from a string in Python

But was hoping to come up with a way to do this except not to strip out the periods in links. So if the string is

I love using stackoverflow.com on Fridays, Saturdays and Mondays!

It would return

I love using stackoverflow.com on Fridays Saturdays and Monday

In fact ideally I would be able to pass in a list of common link endings like .com, .net, .ly etc.

Community
  • 1
  • 1
JiminyCricket
  • 7,050
  • 7
  • 42
  • 59

3 Answers3

5

You can use negative look-aheads:

[,!?]|\.(?!(com|org|ly))
Jacob
  • 1,516
  • 9
  • 7
3

Conventions suggest that you use a space after . , ! or things like that. If you can count on correct typing you can create a regex which strips these character only if they are followed by spaces. (Or at least do like this with the fullstop character).

The following regex will identify these:

[.,!?-](\s|$)

An other possibility is to use a list of legal TLD names. prefixes like www. or other patters like @ which keep the original punctuation around them.

vbence
  • 20,084
  • 9
  • 69
  • 118
1

how about this (which is pretty much what Felix Kling already suggested):

original = 'I love using stackoverflow.com on Fridays, Saturdays and Mondays!'
unwanted_chars = ',.!?;:'

bits = original.split()
cleaned_up = ' '.join([bit.strip(unwanted_chars) for bit in bits])
print cleaned_up
# I love using stackoverflow.com on Fridays Saturdays and Mondays

edit:

ps: 'cleaned_up' would then be the depunctuated string

martineau
  • 119,623
  • 25
  • 170
  • 301
HumanCatfood
  • 960
  • 1
  • 7
  • 20