Python find and replace last appearance in list

Question

In Python, I have a list of list

list3 = ['PA0', 'PA1']
list2 = ['PB0', 'PB1']
list1 = ['PC0', 'PC1', 'PC2']

[(list1[i], list2[j], list3[k]) for i in xrange(len(list1)) for j in xrange(len(list2)) for k in xrange(len(list3))]

#Result
[('PC0', 'PB0', 'PA0'), 
('PC0', 'PB0', 'PA1'), 
('PC0', 'PB1', 'PA0'), 
('PC0', 'PB1', 'PA1'), 
('PC1', 'PB0', 'PA0'), 
('PC1', 'PB0', 'PA1'), 
('PC1', 'PB1', 'PA0'), 
('PC1', 'PB1', 'PA1'), 
('PC2', 'PB0', 'PA0'), 
('PC2', 'PB0', 'PA1'), 
('PC2', 'PB1', 'PA0'), 
('PC2', 'PB1', 'PA1')]

How can I find the last appearance and add E as suffix

[('PC0', 'PB0', 'PA0'), 
 ('PC0', 'PB0', 'PA1'), 
 ('PC0', 'PB1', 'PA0'), 
 ('PC0E', 'PB1', 'PA1'), 
 ('PC1', 'PB0', 'PA0'), 
 ('PC1', 'PB0', 'PA1'), 
 ('PC1', 'PB1', 'PA0'), 
 ('PC1E', 'PB1', 'PA1'), 
 ('PC2', 'PB0', 'PA0'), 
 ('PC2', 'PB0E', 'PA1'), 
 ('PC2', 'PB1', 'PA0E'), 
 ('PC2E', 'PB1E', 'PA1E')]

Also, could you put square brackets where they belong? I just want to make sure I am reading this correctly as a list of tuples. — Mad Physicist, Sep 07 '16 at 15:51
Tuples are immutable, so you'll probably want to start by converting it to a list. — Chris Mueller, Sep 07 '16 at 15:52
I updated the question. Please help me to find out the solution — Quan Nguyen, Sep 07 '16 at 15:55
If it's sorted can't you just go from the back of the list and keep track on what is already marked and just mark the first time appearing? So like first iteration will mark PA1 with E, and add it to some list, next iteration it will see PB1 and mark that with E since it's not on that some list. — MooingRawr, Sep 07 '16 at 15:55
Loop over your master list, starting with the _second_ item. Compare the first string in the item to the first string in the previous item. If they differ, add an 'E' to the first string. Repeat for the second and third strings. — John Gordon, Sep 07 '16 at 16:01
@JohnGordon: that won't tell you when something is the *last* occurrence, only that the value changed. Since the 3rd item is alternating each row, your method can't find the last occurrence just by comparing to the preceding row. — Martijn Pieters, Sep 07 '16 at 16:03
Are the columns of the list distinct, or can they be mixed? If they can be mixed, do you want the absolute last occurrence of each element, or the last occurrence in each column? See the discussion under @MartijnPieters' question for more info. We made different assumptions on the subject. — Mad Physicist, Sep 07 '16 at 16:23

Martijn Pieters · Accepted Answer · 2016-09-07T16:00:38.393

2

Process your input list in reverse, then mark the first occurrence of any value. You can use a list of sets to track what values you've already seen. Reverse the output list you build when you are done:

seensets = [set() for _ in inputlist[0]]
outputlist = []
for entry in reversed(inputlist):
    newentry = []
    for value, seen in zip(entry, seensets):
        newentry.append(value + 'E' if value not in seen else value)
        seen.add(value)
    outputlist.append(tuple(newentry))
outputlist.reverse()

Demo:

>>> seensets = [set() for _ in inputlist[0]]
>>> outputlist = []
>>> for entry in reversed(inputlist):
...     newentry = []
...     for value, seen in zip(entry, seensets):
...         newentry.append(value + 'E' if value not in seen else value)
...         seen.add(value)
...     outputlist.append(tuple(newentry))
...
>>> outputlist.reverse()
>>> pprint(outputlist)
[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]

edited Sep 07 '16 at 16:00

answered Sep 07 '16 at 15:56

Martijn Pieters

1,048,767
296
4,058
3,343

1

As long as you're using a `list` anyway, perhaps change the last line from `outputlist = outputlist[::-1]` to `outputlist.reverse()` to perform reversal in place instead of making a new, reversed `list` and throwing away the old one? – ShadowRanger Sep 07 '16 at 15:59
I think mine's shorter :) – Mad Physicist Sep 07 '16 at 16:12
@MadPhysicist: yours treats all values as one namespace; I used a set per column. – Martijn Pieters Sep 07 '16 at 16:14
@MadPhysicist: you also use `list.index()` for each unique value, which is rather expensive. – Martijn Pieters Sep 07 '16 at 16:14
It's not clear from OP's post whether the namespaces are separate or not. – Mad Physicist Sep 07 '16 at 16:16
@MadPhysicist: Nope, that is indeed not clear; we both made the opposite assumption. – Martijn Pieters Sep 07 '16 at 16:17

score 1 · Answer 2 · edited May 23 '17 at 11:53

If you are not looking for lightning speed here, you could do the following:

Flatten the list using https://stackoverflow.com/a/952952/2988730
Find the unique elements
Find the index of the last occurrence of each unique element (by reversing the list)
Update the element
Reshape the flattened list back using https://stackoverflow.com/a/10124783/2988730

Here is a sample implementation:

# 1
flat = list(reversed([x for group in mylist for x in group]))
# 2
uniq = set(flat)
# 3, 4
for x in uniq:
    flat[flat.index(x)] += 'E'
# 5
mylist = list(zip(*[reversed(flat)]*3))

Result:

[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]

Padraic Cunningham · Answer 3 · 2016-09-07T17:06:16.383

Another approach that gathers keeps adding the indexes so you end up with the indexes for the last occurrence, itertools.product will also create the initial list for you:

from itertools import product

def last_inds(prod):
    # the key/value will be overwritten so we always keep the last seen
    return {ele: (i1, i2) for i1, prod in enumerate(prod) for i2, ele in enumerate(prod)}

prod = list(product(*(list1, list2, list3)))

# use the indexes to change the last occurrences.
for r, c in last_inds(prod).values():
    lst = list(prod[r])
    lst[c] += "E"
    prod[r] = tuple(lst)

Which gives you the expected output:

[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]

On my timings it is the fastest approach using your data.

In [37]: %%timeit
prod = list(product(*(list1, list2, list3)))
m(prod)
   ....: 
10000 loops, best of 3: 20.7 µs per loop

In [38]: %%timeit
prod = list(product(*(list1, list2, list3)))
for r, c in last_inds(prod).values():
    lst = list(prod[r])
    lst[c] += "E"
    prod[r] = tuple(lst)
   ....: 

100000 loops, best of 3: 12.2 µs per loop

Where m is:

def m(inputlist):
    seensets = [set() for _ in inputlist[0]]
    outputlist = []
    for entry in reversed(inputlist):
        newentry = []
        for value, seen in zip(entry, seensets):
            newentry.append(value + 'E' if value not in seen else value)
            seen.add(value)
        outputlist.append(tuple(newentry))
    outputlist.reverse()

Python find and replace last appearance in list

3 Answers3