1

I have a list of python objects and I'd like to remove duplicates in the list based on the time value. For example:

class MyClass(object):

    identifier = models.CharField(max_length=128)
    label = models.CharField(max_length=128)
    stat_time = models.DateTimeField(auto_now_add=True)
    def __unicode__(self):
        return str(self.label)

My list may have several instances of MyClass with the same label but different stat_times. I'd like to trim the list and have only one instance of the label with the latest stat_time.

>>> my_list
[MyClass: xxx, MyClass: yyy, MyClass: yyy, MyClass: zzz]

I'd like to end up with:

>>> my_list
[MyClass: xxx, MyClass: yyy, MyClass: zzz]

Here my_list should only contain one instance of MyClass with the 'yyy' label with the latest stat_time.

I hope I have made that clear. Any suggestions much appreciated.

qwertynl
  • 3,912
  • 1
  • 21
  • 43
Kevin Sheahan
  • 141
  • 2
  • 6
  • Is this a Django question? – ericmjl Jan 03 '14 at 16:13
  • 1
    I haven't used Django but I imagine it would be easier to make sure copies aren't added in the first place instead of sanitizing the list afterwards. – kylieCatt Jan 03 '14 at 16:15
  • Is the order of items important in your list? – 9000 Jan 03 '14 at 16:18
  • What attempts have you made and what's wrong with them? – martineau Jan 03 '14 at 16:21
  • Do you want to filter on `__unicode__` or `stat_time`? As your Q currently stands, it's a bit hard to understand. – Steinar Lima Jan 03 '14 at 16:34
  • BTW: You should return `unicode(self.label)`, not `str(self.label)` in the `__unicode__` method. You may run into decoding errors otherwise. – Steinar Lima Jan 03 '14 at 16:36
  • Hi. Yes, this is a Django environment (in fact my Class should read class MyClass(models.Model). The order of the items are not important just as long as I have a unique list with the latest stat_time. I have tried using groupby from itertools but that will only group them by name. I then still need to filter them on the latest stat_time. – Kevin Sheahan Jan 03 '14 at 16:41
  • I need to filter on both the label name AND the stat_time, so I get a unique label with the latest stat_time. – Kevin Sheahan Jan 03 '14 at 16:42

2 Answers2

1

One way you could do it is to create a dict mapping values of label to MyClass instances. You would add each the elements of your list to this dict, but only keep the wanted values.

aDict = dict()
for element in myList:
    s = element.label
    if s not in aDict: # the key is not used yet
        aDict[s] = element
    else:
        aDict[s] = max(aDict[s], element, key = lambda x: x.stat_time)
myList = list(aDict.items()) # iteritems() in Python 2

The lambda expression passed into max tells Python which value to compare when computing the max.

SimonT
  • 2,219
  • 1
  • 18
  • 32
0

I'm not sure if you should filter your object based on __unicode__(), but here is how I would have done it.

unique_objs = []

for o in my_list:
    if (o.__unicode__(), o.stat_time) in unique_objs:
        continue
    new_list.append(o)
    unique_objs.append(tuple(o.__unicode__(), o.stat_time))
Steinar Lima
  • 7,644
  • 2
  • 39
  • 40