2

Basically, my question is the same as this but in Python (and GAE), not C#.

Requirements:

  • Separate each word by a dash and remove all punctuation (taking into account not all words are separated by spaces.)
  • Function takes in a max length, and gets all tokens below that max length. Example: ToSeoFriendly("hello world hello world", 14) returns "hello-world"
  • All words are converted to lower case.
Community
  • 1
  • 1
zakdances
  • 22,285
  • 32
  • 102
  • 173

3 Answers3

5
def ToSeoFriendly(s, maxlen):
    '''Join with dashes, eliminate punction, clip to maxlen, lowercase.

        >>> ToSeoFriendly("The quick. brown4 fox jumped", 14)
        'the-quick-brow'

    '''
    t = '-'.join(s.split())                                # join words with dashes
    u = ''.join([c for c in t if c.isalnum() or c=='-'])   # remove punctation   
    return u[:maxlen].rstrip('-').lower()                  # clip to maxlen

if __name__ == '__main__':
    import doctest
    print doctest.testmod()
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
3

The term you're searching for is 'slugify'.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
3

As alternative (and probably more tested version) I'm suggesting you use the (minimal modified) slugify code from Django:

import unicodedata
import re

def slugify(value):
    """
    Normalizes string, converts to lowercase, removes non-alpha characters,
    and converts spaces to hyphens.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return re.sub('[-\s]+', '-', value)

See: https://github.com/django/django/blob/master/django/utils/text.py#L435

Martin Thurau
  • 7,564
  • 7
  • 43
  • 80