How do I keep the index of the duplicate element unchanged

Question

Here is a input list:

['a', 'b', 'b', 'c', 'c', 'd']

The output I expect should be:

[[0, 'a'], [1, 'b'],  [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]

I try to use map()

>>> map(lambda (index, word): [index, word], enumerate([['a', 'b', 'b', 'c', 'c', 'd']])
[[0, 'a'], [1, 'b'], [2, 'b'], [3, 'c'], [4, 'c'], [5, 'd']]

How can I get the expected result?

EDIT: This is not a sorted list, the index of each element increase only when meet a new element

Thanks a lot, I forgot this key point – fishiwhj Jul 15 '12 at 08:22 — fishiwhj, Jul 15 '12 at 08:22

score 6 · Answer 1 · answered Jul 15 '12 at 08:10

6

>>> import itertools
>>> seq = ['a', 'b', 'b', 'c', 'c', 'd']
>>> [[i, c] for i, (k, g) in enumerate(itertools.groupby(seq)) for c in g]
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]

answered Jul 15 '12 at 08:10

jamylak

128,818
30
231
230

@IgorChubin Depends on what the OP wants, he didn't have a result for an unsorted list so I'm not sure... – jamylak Jul 15 '12 at 08:14
That's right. But when you need solution for the unsorted list and want to save its order, this will not work at all. – Igor Chubin Jul 15 '12 at 08:16

score 4 · Answer 2 · answered Jul 15 '12 at 08:14

4

[
    [i, x]
    for i, (value, group) in enumerate(itertools.groupby(['a', 'b', 'b', 'c', 'c', 'd']))
    for x in group
]

answered Jul 15 '12 at 08:14

Karl Knechtel

62,466
11
102
153

the same as the solution from jamylak; works only for storted lists. – Igor Chubin Jul 15 '12 at 08:19
Really exactly the same, just different names and with the value in-line as in the OP. And figured out marginally slower :) – Karl Knechtel Jul 15 '12 at 09:02

Marcelo Cantos · Accepted Answer · 2012-07-15T11:12:26.280

1

It sounds like you want to rank the terms based on a lexicographical ordering.

input = ['a', 'b', 'b', 'c', 'c', 'd']
mapping = { v:i for (i, v) in enumerate(sorted(set(input))) }
[ [mapping[v], v] for v in input ]

Note that this works for unsorted inputs as well.

If, as your amendment suggests, you want to number items based on order of first appearance, a different approach is in order. The following is short and sweet, albeit offensively hacky:

[ [d.setdefault(v, len(d)), v] for d in [{}] for v in input ]

edited Jul 15 '12 at 11:12

answered Jul 15 '12 at 08:12

Marcelo Cantos

181,030
38
327
365

nice solution but too much work for long lists; O(n*log(n)), but the obvious solution takes O(n) – Igor Chubin Jul 15 '12 at 08:17
@IgorChubin: You can only do better by assuming sorted input. I intentionally avoided that. – Marcelo Cantos Jul 15 '12 at 08:18
There is no need for sorted input. The O(n) solution is obvious and you need sorted input at all. – Igor Chubin Jul 15 '12 at 08:21
@IgorChubin: Your solution isn't O(n). The expression `if k not in d` is O(log(n)) and you call it from inside an O(n) loop. Ranking unsorted inputs is equivalent to sorting, for which there is no O(n) solution in the general case. – Marcelo Cantos Jul 15 '12 at 08:23
No, `k not in d` it is O(1). See http://stackoverflow.com/questions/1963507/time-complexity-of-accessing-a-python-dict for example. – Igor Chubin Jul 15 '12 at 08:27
@MarceloCantos: I thought dictionary lookups were O(1). – Joel Cornett Jul 15 '12 at 08:28
@IgorChubin: My apologies. You are correct that member-test and insertion amortize to O(n) for a `dict`, so my initial analysis was incorrect. Your solution isn't actually ranking the results, merely tracking order of first appearance. Passing `['a', 'c', 'b', 'd', 'c', 'b']` yields `[[0, 'a'], [1, 'c'], [2, 'b'], [3, 'd'], [1, 'c'], [2, 'b']]`, but `'c'` doesn't sort before `'b'`. It merely makes its first appearance earlier. Of course, it isn't obvious from the OP's question whether ranking is required, so the question of whether O(n log(n)) is the best possible answer remains an open one. – Marcelo Cantos Jul 15 '12 at 08:39
No, Marcelo, you are not 100%-right, because `fishi` added later this remark: "the index of each element increase only when meet a new element". I think that this remark is enough to understand that you must increase the counter only when you meet a new item. That means that your solution is incorrect at all, because in your own example it will provide `1` for `b`, but it must be `2`. – Igor Chubin Jul 15 '12 at 08:49
@IgorChubin: In the context of the amendment, certainly. I'm not sure why fishiwhj accepted my answer, then. – Marcelo Cantos Jul 15 '12 at 10:48
@MarceloCantos: I think that it is not so important for him + may be he didn't get the point. I really like your answer (and the answer from jamylak), I know that my solution is not so elegant; but it would be really great if one could make a solution so correct as mine and so elegant as yours. I suppose that it is possible to do with DefaultDict (may be) but don't know yet how. – Igor Chubin Jul 15 '12 at 11:04
@IgorChubin: I've amended my answer with the tersest answer I can come with at short notice. Whether it's an elegant solution or not is open to debate. – Marcelo Cantos Jul 15 '12 at 11:14

Igor Chubin · Answer 4 · 2012-07-15T18:04:02.907

When list is sorted use groupby (see jamylak answer); when not, just iterate over the list and check if you've seen this letter already:

a = ['a', 'b', 'b', 'c', 'c', 'd']
result = []
d = {}
n = 0
for k in a:
  if k not in d:
     d[k] = n
     n += 1
  result.append([d[k],k])

It is the most effective solution; it takes only O(n) time.

Example of usage for unsorted lists:

[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd'], [0, 'a']]

As you can see, you have here the same order of items as in the input list.

When you sort the list first you need O(n*log(n)) additional time.

How do I keep the index of the duplicate element unchanged

4 Answers4