11

I've got a Perl function which takes a timestamp and returns either the unchanged timestamp (if it's never seen it before) or otherwise, it appends some letters to make it unique:

sub uniqify($) {
  my $timestamp = shift;

  state $last_ts = -1;
  state $next_letter = 'A';

  if ($timestamp == $last_ts) {
    $timestamp .= $next_letter++;
  } else {
    $last_ts = $timestamp;
    $next_letter = 'A';
  }

  return $timestamp;
}

So if you call it four times, with the values 1, 1, 1, and 2, it will return 1, then 1A, then 1B, then 2.

Note: It only ever gets called with ever-increasing timestamps, so it doesn't need to recall every one it's ever seen, just the last one.

Now I need to translate this function to Python. I've learned that I can replace the "state" variables with globals (yuck!) or perhaps attach them to the function as attributes, but neither of those is particularly elegant.

Also, Python doesn't have something like Perl's magic autoincrement, where if you "++" a variable whose value is "A", it becomes "B" -- or if it's "Z", it becomes "AA". So that's a curveball too.

I'm hacking together a solution, but it's really ugly and hard to read. Translating from Perl to Python is supposed to have the opposite effect, right? :) So I'm offering this as a challenge to SO users. Can you make it an elegant Python function?

mike
  • 46,876
  • 44
  • 102
  • 112
  • It's not really fair to bash Python for not having perl's string increment niche feature. But it's kind of disappointing that no one has yet risen to the challenge of doing it in a pythony way. Function attributes don't seem that yucky to me, if that's what Chris J was using. – ysth Mar 03 '09 at 01:47
  • what comes after 'AZ' in perl's auto increment? 'AAA'?? – hasen Mar 03 '09 at 02:40
  • I don't know what the purpose of this question is other than to bait the Python guys into bashing Perl, or vice versa. Is there an actual requirement here, other than that? – Craig S Mar 03 '09 at 03:13
  • @hasen j: BA. See: http://perldoc.perl.org/perlop.html#Auto-increment-and-Auto-decrement – ysth Mar 03 '09 at 05:24
  • 1
    I wasn't bashing. I was making a statement of fact. I have a genuine, real-world Perl script that happens to use this Perl feature. Now I'm porting it to Python. Python doesn't have this feature. That's just the way it is. If I didn't like Python, I wouldn't be porting the script in the first place. – mike Mar 03 '09 at 18:39

8 Answers8

7

Look at this answer for a robust method to convert a number to an alphanumeric id

The code I present doesn't go from 'Z' to 'AA', instead goes to 'BA', but I suppose that doesn't matter, it still produces a unique id

from string import uppercase as up
import itertools

def to_base(q, alphabet):
    if q < 0: raise ValueError( "must supply a positive integer" )
    l = len(alphabet)
    converted = []
    while q != 0:
        q, r = divmod(q, l)
        converted.insert(0, alphabet[r])
    return "".join(converted) or alphabet[0]

class TimestampUniqifier( object ):
    def __init__(self):
        self.last = ''
        self.counter = itertools.count()
    def __call__( self, str ):
        if str == self.last:
            suf = self.counter.next()
            return str + to_base( suf, up )
        else:
            self.last = str
            self.counter = itertools.count()
            return str            

timestamp_uniqify = TimestampUniqifier()

usage:

timestamp_uniqify('1')
'1'
timestamp_uniqify('1')
'1A'
timestamp_uniqify('1')
'1B'
timestamp_uniqify('1')
'1C'
timestamp_uniqify('2')
'2'
timestamp_uniqify('3')
'3'
timestamp_uniqify('3')
'3A'
timestamp_uniqify('3')
'3B'

You can call it maaaany times and it will still produce good results:

for i in range(100): print timestamp_uniqify('4')

4
4A
4B
4C
4D
4E
4F
4G
4H
4I
4J
4K
4L
4M
4N
4O
4P
4Q
4R
4S
4T
4U
4V
4W
4X
4Y
4Z
4BA
4BB
4BC
4BD
4BE
4BF
4BG
4BH
4BI
4BJ
4BK
4BL
4BM
4BN
4BO
4BP
4BQ
4BR
4BS
4BT
4BU
4BV
4BW
4BX
4BY
4BZ
4CA
4CB
4CC
4CD
4CE
4CF
4CG
4CH
4CI
4CJ
4CK
4CL
4CM
4CN
4CO
4CP
4CQ
4CR
4CS
4CT
4CU
4CV
4CW
4CX
4CY
4CZ
4DA
4DB
4DC
4DD
4DE
4DF
4DG
4DH
4DI
4DJ
4DK
4DL
4DM
4DN
4DO
4DP
4DQ
4DR
4DS
4DT
4DU
Community
  • 1
  • 1
hasen
  • 161,647
  • 65
  • 194
  • 231
  • I think you need to reset the counter when you get a new timestamp to match the original perl version. – zdan Mar 03 '09 at 02:13
  • Very nice solution. Sadly it is marred by your incorrect and dismissive comment about Mike's "little" suffixes. Perl's string autoincrement makes 'Z'++ into 'AA'--a point Mike makes in his post. See http://perldoc.perl.org/perlop.html#Auto-increment-and-Auto-decrement for more info. – daotoad Mar 03 '09 at 02:47
  • yea, I found out my mistake later .. hehe – hasen Mar 03 '09 at 03:35
5

Well, sorry to say, but you can't just do a direct translation from Perl to Python (including bit-for-bit Perlisms) and expect the outcome to be prettier. It won't be, it will be considerably uglier.

If you want the prettiness of Python you will need to use Python idioms instead.

Now for the question at hand:

from string import uppercase

class Uniquifier(object):

    def __init__(self):
        self.last_timestamp = None
        self.last_suffix = 0

    def uniquify(self, timestamp):
        if timestamp == self.last_timestamp:
            timestamp = '%s%s' % (timestamp,
                                  uppercase[self.last_suffix])
            self.last_suffix += 1
        else:
            self.last_suffix = 0
            self.timestamp = timestamp
        return timestamp

uniquifier = Uniquifier()
uniquifier.uniquify(a_timestamp)

Prettier? Maybe. More readable? Probably.

Edit (re comments): Yes this fails after Z, and I am altogether unhappy with this solution. So I won't fix it, but might offer something better, like using a number instead:

timestamp = '%s%s' % (timestamp,
                      self.last_suffix)

If it were me, I would do this:

import uuid

def uniquify(timestamp):
    return '%s-%s' % (timestamp, uuid.uuid4())

And just be happy.

Ali Afshar
  • 40,967
  • 12
  • 95
  • 109
  • I don't know python; does that string.uppercase[] produce 'A', 'B', etc? If so, what happens after 'Z'? – ysth Mar 03 '09 at 01:19
  • Judging by my test of it, it crashes with a string index out of range. Also, be careful to `from string import uppercase` since `import string.uppercase` won't work. And in the else block, `self.last_timestamp` instead of `self.timestamp`. I do lament the SO rush to first post. Fall victim myself:-) – Jarret Hardie Mar 03 '09 at 01:22
  • A brief googling says that ascii_uppercase may be a better choice. – ysth Mar 03 '09 at 01:23
  • I know it's tradition for Python militants to bash Perl's readability, but seriously, you can't call your Python more or less readable than his Perl! They're both fairly simple syntactically and are both quite easily readable. Lay off on the Pyvangelism, please. – Chris Lutz Mar 03 '09 at 01:30
  • Can't you make classes funcallable in Python? That would clean this up the interface a bit. – jrockway Mar 03 '09 at 01:35
  • You can, but it doesn't save you from having to instantiate it. Plus, It's a class. Treating it like a function when it isn't could cause confusion. After all, there is no functional dependence on arguments... – SingleNegationElimination Mar 03 '09 at 01:39
  • The class is just an implementation detail and it makes sense to abstract that away. (You could implement it like this in Perl, as well, and get the same ability to reset the function's state easily.) – jrockway Mar 03 '09 at 01:46
  • I think he's on to something using UUID rather than reinventing the unique ID wheel. – Schwern Mar 03 '09 at 04:06
3

I just tested this up to 1000 against the original perl implementation and diff returns the same results for both. The suffix code is tricky -- this is not a base 36 counter. Hasen J's solution - though it produces a unique timestamp - isn't quite the same since it goes from 'Z' to 'BA', when it should instead go to 'AA' to match the perl ++ operator.

#!/usr/bin/python

class uniqify:
    def __init__(self):
        self.last_timestamp = -1
        self.next_suffix = 'A'
        return

    def suffix(self):
        s = self.next_suffix
        letters = [l for l in self.next_suffix]
        if letters[-1] == 'Z':
            letters.reverse()
            nonz = None
            for i in range(len(letters)):
                if letters[i] != 'Z':
                    nonz = i
                    break
            if nonz is not None:
                letters[nonz] = chr(ord(letters[nonz]) + 1)
                for i in range(0, nonz):
                    letters[i] = 'A'
            else:
                letters = ['A'] * (len(letters) + 1)
            letters.reverse()
        else:
            letters[-1] = chr(ord(letters[-1]) + 1)

        self.next_suffix = ''.join(letters)
        return s

    def reset(self):
        self.next_suffix = 'A'
        return

    def __call__(self, timestamp):
        if timestamp == self.last_timestamp:
            timestamp_str = '%s%s' % (timestamp, self.suffix())
        else:
            self.last_timestamp = timestamp
            self.reset()
            timestamp_str = '%s' % timestamp

        return timestamp_str

uniqify = uniqify()

if __name__ == '__main__':
    for n in range(1000):
        print uniqify(1)
    for n in range(1000):
        print uniqify(2)
bstpierre
  • 30,042
  • 15
  • 70
  • 103
  • 1
    for the record, I didn't come up with that method :) also it does a nice job of producing "unique" timestamps, the fact that it's not *exactly* like perl doesn't matter! While your answer produces same output as perl, it's no where near short .. it's actually quite complex – hasen Mar 03 '09 at 04:26
2

The class is generic and boring, but This is my very first recursive generator. <3

def stamptag():
    yield ''
    prefixtag = stamptag()
    prefix = prefixtag.next()
    while True:
        for i in range(ord('A'),ord('Z')+1):
            yield prefix+chr(i)
        prefix = prefixtag.next()

tagger = stamptag()
for i in range(3000):
    tagger.next()
print tagger.next()

class uniquestamp:
    def __init__(self):
        self.timestamp = -1
        self.tagger = stamptag()

    def format(self,newstamp):
        if self.timestamp < newstamp:
            self.tagger = stamptag()
            self.timestamp = newstamp
        return str(newstamp)+self.tagger.next()

stamper = uniquestamp()
print map(stamper.format, [1,1,1,2,2,3,4,4])

output:

DKJ
['1', '1A', '1B', '2', '2A', '3', '4', '4A']
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
1

Quite similar to Ali A, but I'll post mine anyway:

class unique_timestamp:
    suffixes = " ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    def __init__(self):
        self.previous_timestamps = {}
        pass
    def uniquify(self, timestamp):
        times_seen_before = self.previous_timestamps.get(timestamp, 0)
        self.previous_timestamps[timestamp] = times_seen_before + 1
        if times_seen_before > 0:
            return str(timestamp) + self.suffixes[times_seen_before]
        else:
            return str(timestamp)

Usage:

>>> u = unique_timestamp()
>>> u.uniquify(1)
'1'
>>> u.uniquify(1)
'1A'
>>> u.uniquify(1)
'1B'
>>> u.uniquify(2)
'2'
S.Lott
  • 384,516
  • 81
  • 508
  • 779
kquinn
  • 10,433
  • 4
  • 35
  • 35
1

Does the suffix have to be letters like that?

from itertools import count
def unique(timestamp): 
  if timestamp in unique.ts.keys():
    return timestamp + '.' + str(unique.ts[timestamp].next())
  else:
    unique.ts[timestamp] = count()
    return timestamp
unique.ts = {}

You can define a different count if you want the letters back.

This isn't the same as your perl code, though.

  • It keeps a dict around so if you have lots of unique timestamps then you'll use lots of memory.
  • It handles out of order calls, which the original doesn't (i.e. u(1), u(2), u(1)).
ysth
  • 96,171
  • 6
  • 121
  • 214
Justus
  • 203
  • 1
  • 4
0

This is my first time answering, and I used globals, but it seemed the simplest way to me.

from string import uppercase

last_ts = None
letters = None

def increment(letters):
    if not letters:
        return "A"
    last_letter = letters[-1]
    if last_letter == "Z":
        return increment(letters[:-1])  + "A" 
    return letters[:-1] + uppercase[uppercase.index(last_letter) + 1]

def uniquify(timestamp):
    global last_ts, letters
    if timestamp == last_ts:
        letters = increment(letters)
        return timestamp + letters
    last_ts = timestamp
    letters = None
    return timestamp

print uniquify("1")
print uniquify('1')
print uniquify("1")
print uniquify("2")
for each in range(100): print uniquify("2")


1
1A
1B
2
2A
2B
2C
2D
2E
2F
2G
2H
2I
2J
2K
2L
2M
2N
2O
2P
2Q
2R
2S
2T
2U
2V
2W
2X
2Y
2Z
2AA
2AB
2AC
2AD
2AE
2AF
2AG
2AH
2AI
2AJ
2AK
2AL
2AM
2AN
2AO
2AP
2AQ
2AR
2AS
2AT
2AU
2AV
2AW
2AX
2AY
2AZ
2BA
2BB
2BC
2BD
2BE
2BF
2BG
2BH
2BI
2BJ
2BK
2BL
2BM
2BN
2BO
2BP
2BQ
2BR
2BS
2BT
2BU
2BV
2BW
2BX
2BY
2BZ
2CA
2CB
2CC
2CD
2CE
2CF
2CG
2CH
2CI
2CJ
2CK
2CL
2CM
2CN
2CO
2CP
2CQ
2CR
2CS
2CT
2CU
2CV
0

Looking at the problem it seems like a good fit for a coroutine (Python 2.5 or higher). Here's some code that will roughly produce the same result:

def uniqify():
    seen = {}
    val = (yield None)
    while True:
        if val in seen:
            idxa, idxb = seen[val]
            idxb += 1
        else:
            idxa, idxb = (len(seen)+1, ord('a'))
        seen[val] = (idxa, idxb)
        uniq = "%s%s" % (idxa, chr(idxb))
        val = (yield uniq)

And here's how you use it:

>>> u = send.uniqify()
>>> u.next() #need this to start the generator
>>> u.send(1)
'1a'
>>> u.send(1)
'1b'
>>> u.send(1)
'1c'
>>> u.send(2)
'2a'
>>> u.send(2)
'2b'
>>> u.send(1) #you can go back to previous values
'1d'
>>> u.send('stringy') #you can send it anything that can be used as a dict key
'3a'
Steven Kryskalla
  • 14,179
  • 2
  • 40
  • 42