-1
def pre_process(t):
    """ (str) -> str
    returns a copy of the string with all punctuation removed, and all letters set to lowercase. The only characters in the output will be lowercase letters, numbers, and whitespace.

    """
Wolph
  • 78,177
  • 11
  • 137
  • 148
Eric Choi
  • 21
  • 1
  • 2

3 Answers3

1

Try the following code.

import re

string = 'This is an example sentence.'
string = re.sub(r'[^a-zA-Z\d]', string)

print(string)

You should get out Thisisanexamplesentance.

Zak
  • 1,910
  • 3
  • 16
  • 31
0

Just rebuild your string with only alpha-numeric characters:

''.join(_char for _char in _str.lower() if _char.isalnum())
midori
  • 4,807
  • 5
  • 34
  • 62
0

This is the simplest function using regex I could put together to achieve your requirement.

import re
def pre_process(t):
    return re.sub(r'[^a-z\d ]','',str.lower())

It will return the input string in lower case, and omit any characters that are not letters, numbers or whitespace.

maze88
  • 850
  • 2
  • 9
  • 15