0

This is a convoluted example, but it shows what I'm attempting to do. Say I have a string:

from string import ascii_uppercase, ascii_lowercase, digits
s = "Testing123"

I would like to replace all values in s that appear in ascii_uppercase with "L" for capital letter, all values that appear in ascii_lowercase with "l" for lowercase letter, and those in digits with "n" for a number.

I'm currently doing:

def getpattern(data):
    pattern = ""
    for c in data:
        if c in ascii_uppercase: pattern += "L"; continue
        if c in ascii_lowercase: pattern += "l"; continue
        if c in digits: pattern += "n"; continue
        pattern += "?"

However, this is tedious with several more lists to replace. I'm usually better at finding map-type algorithms for things like this, but I'm stumped. I can't have it replace anything that was already replaced. For example, if I run the digits one and replace it with "n", the next iteration might replace that with "l" because "n" is a lowercase letter.

getpattern("Testing123") == "Lllllllnnn"
Goodies
  • 4,439
  • 3
  • 31
  • 57
  • 1
    Can you provide an actual example of what you're trying to achieve exactly? There are numerous ways to map a simple string to another, but your real case may be more constricted. Control characters, like NULL or ACK are well within the ordinal range of 0-255, even though not printable, but if you're dealing with Unicode characters, you may need a different approach. – Reti43 Jan 28 '16 at 04:47
  • The selected answer is ideal. I do not believe I need Unicode, but if I do, I believe I can make it work. Thank you! – Goodies Jan 28 '16 at 04:52

3 Answers3

2

You can create a translation table that maps all upper case letters to 'L', all lower case letters to 'l' and all digits to 'n'. Once you have such a map, you can pass it to str.translate().

from string import ascii_uppercase, ascii_lowercase, digits, maketrans
s = "Testing123"

intab = ascii_uppercase + ascii_lowercase + digits
outtab = ('L' * 26) + ('l' * 26) + ('n' * 10)
trantab = maketrans(intab, outtab)

print s.translate(trantab)

Note that in Python 3 there is no string.maketrans function. Instead, you get the method from the str object str.maketrans(). Read more about this here and the documentation here

I'm not exactly certain of the internals of str.translate(), but my educated guess is the mapping creates a length 256 string for each string character. So as it passes over your string, it'll translate \x00 to \x00, \x01 to \x01, etc, but A to L. That way you don't have to check whether each character is in your translation dictionary. I presume blindly translating all characters with no branches would result to better performance. Print ''.join(chr(i) for i in range(256)) in comparison to see this.

Community
  • 1
  • 1
Reti43
  • 9,656
  • 3
  • 28
  • 44
  • Interesting, I've never used `maketrans` before. The trantab variable is 256 bytes long whereas the intab+outtab length is only 124. Is there a method to this madness? It doesn't seem very efficient. – Goodies Jan 28 '16 at 03:58
  • Also, would I be able to have a default value like I have in the example? I don't mind typing it all out. I could just create a dict with { ascii_uppercase : 'L' .... etc} and iterate through that. But that is good to know. – Goodies Jan 28 '16 at 04:01
  • @Goodies I don't understand what you mean by default value. I see no example of this in you question. – Reti43 Jan 28 '16 at 04:07
1

They're in different 32-blocks of ASCII, so you can do this:

>>> ''.join(' nLl'[ord(c) // 32] for c in s)
'Lllllllnnn'

Your example suggests that you don't have other characters, but if you do, this should work:

>>> s = "Testing123 and .?#!-+ äöüß"
>>> ''.join(' nLl'[ord(c) // 32] if c <= 'z' and c.isalnum() else '?' for c in s)
'Lllllllnnn?lll????????????'
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • +1 for creativity. They are strings, but I can't be certain that they are all printable characters. Also, I can't set them to an ord value because they may not all be in that range. I should have specified in OP, though. – Goodies Jan 28 '16 at 04:39
  • 1
    I just added a version that should work for other characters as well. – Stefan Pochmann Jan 28 '16 at 04:40
0

Just in case you need to process unicode data:

import unicodedata

cat = {'Lu':'L', 'Ll':'l', 'Nd':'n'}

def getpattern(data):
    return ''.join(cat.get(unicodedata.category(c),c) for c in data)
RootTwo
  • 4,288
  • 1
  • 11
  • 15