0

I would like to strip all of the the punctuations (except the dot) from the beginning and end of a string, but not in the middle of it.

For instance for an original string:

@#%%.Hol$a.A.$%

I would like to get the word .Hol$a.A. removed from the end and beginning but not from the middle of the word.

Another example could be for the string:

@#%%...&Hol$a.A....$%

In this case the returned string should be ..&Hol$a.A.... because we do not care if the allowed characters are repeated.

The idea is to remove all of the punctuations( except the dot ) just at the beginning and end of the word. A word is defined as \w and/or a .

A practical example is the string 'Barnes&Nobles'. For text analysis is important to recognize Barnes&Nobles as a single entity, but without the '

How to accomplish the goal using Regex?

user2288043
  • 241
  • 4
  • 15
  • Possible duplicate of [Remove all special characters, punctuation and spaces from string](http://stackoverflow.com/questions/5843518/remove-all-special-characters-punctuation-and-spaces-from-string) – Laxmikant Ratnaparkhi May 02 '16 at 05:25
  • @LaxmikantGurnalkar: It is not a duplicate. – AKS May 02 '16 at 05:28
  • 2
    @user2288043: Could you post more examples? Only one is not very useful for covering other cases there could be. – AKS May 02 '16 at 05:30
  • what if there are multiple dots like :- `@#%%....Hol$a.A....$%`? – rock321987 May 02 '16 at 05:36
  • I added more details about the problem so it could be useful for another person too. However, it is solved with the reply of @ByteCommander Thanks! – user2288043 May 02 '16 at 14:54

2 Answers2

2

Use this simple and easily adaptable regex:

[\w.].*[\w.]

It will match exactly your desired result, nothing more.

  • [\w.] matches any alphanumeric character and the dot
  • .* matches any character (except newline normally)
  • [\w.] matches any alphanumeric character and the dot

To change the delimiters, simply change the set of allowed characters inside the [] brackets.

Check this regex out on regex101.com

import re
data = '@#%%.Hol$a.A.$%'
pattern = r'[\w.].*[\w.]'
print(re.search(pattern, data).group(0))
# Output: .Hol$a.A.
Byte Commander
  • 6,506
  • 6
  • 44
  • 71
  • This is a generic one considering the user hasn't provided much details or more examples. – AKS May 02 '16 at 06:05
  • This works as a charm.... The number of dots was not important... Another example could be string @#%%....Hol$a.A....$% in this case the returned value should be ....Hol$a.A.... – user2288043 May 02 '16 at 14:38
  • @user2288043 If this answer solved your problem, please accept it by clicking the check button on its left. – Byte Commander May 02 '16 at 17:55
1

Depending on what you mean with striping the punctuation, you can adapt the following code :

import re
res = re.search(r"^[^.]*(.[^.]*.([^.]*.)*?)[^.]*$", "@#%%.Hol$a.A.$%")
mystr = res.group(1)

This will strip everything before and after the dot in the expression. Warning, you will have to check if the result is different of None, if the string doesn't match.

Richard
  • 992
  • 1
  • 11
  • 27
  • This only works if the result string is delimited by dots, right? There's a more general (and more simple) expression for this job. – Byte Commander May 02 '16 at 05:54
  • If there is an alphabet just before the dot then it wouldn't work: `@#%%a.Hol$a.A.$%`. – AKS May 02 '16 at 05:55
  • Yes, but the question lacked details. I assumed he wanted to strip anything before the first dot and after the last one. If you wanna strip exclusively certain characters, then regex is not the way to go. – Richard May 02 '16 at 06:01