2

In Python, I have a list of list

list3 = ['PA0', 'PA1']
list2 = ['PB0', 'PB1']
list1 = ['PC0', 'PC1', 'PC2']

[(list1[i], list2[j], list3[k]) for i in xrange(len(list1)) for j in xrange(len(list2)) for k in xrange(len(list3))]

#Result
[('PC0', 'PB0', 'PA0'), 
('PC0', 'PB0', 'PA1'), 
('PC0', 'PB1', 'PA0'), 
('PC0', 'PB1', 'PA1'), 
('PC1', 'PB0', 'PA0'), 
('PC1', 'PB0', 'PA1'), 
('PC1', 'PB1', 'PA0'), 
('PC1', 'PB1', 'PA1'), 
('PC2', 'PB0', 'PA0'), 
('PC2', 'PB0', 'PA1'), 
('PC2', 'PB1', 'PA0'), 
('PC2', 'PB1', 'PA1')]

How can I find the last appearance and add E as suffix

[('PC0', 'PB0', 'PA0'), 
 ('PC0', 'PB0', 'PA1'), 
 ('PC0', 'PB1', 'PA0'), 
 ('PC0E', 'PB1', 'PA1'), 
 ('PC1', 'PB0', 'PA0'), 
 ('PC1', 'PB0', 'PA1'), 
 ('PC1', 'PB1', 'PA0'), 
 ('PC1E', 'PB1', 'PA1'), 
 ('PC2', 'PB0', 'PA0'), 
 ('PC2', 'PB0E', 'PA1'), 
 ('PC2', 'PB1', 'PA0E'), 
 ('PC2E', 'PB1E', 'PA1E')]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Quan Nguyen
  • 559
  • 4
  • 23
  • 47
  • 1
    Is the input list always sorted? – Martijn Pieters Sep 07 '16 at 15:50
  • Yes, it already sorted as example – Quan Nguyen Sep 07 '16 at 15:51
  • Also, could you put square brackets where they belong? I just want to make sure I am reading this correctly as a list of tuples. – Mad Physicist Sep 07 '16 at 15:51
  • 1
    Tuples are immutable, so you'll probably want to start by converting it to a list. – Chris Mueller Sep 07 '16 at 15:52
  • I updated the question. Please help me to find out the solution – Quan Nguyen Sep 07 '16 at 15:55
  • If it's sorted can't you just go from the back of the list and keep track on what is already marked and just mark the first time appearing? So like first iteration will mark PA1 with E, and add it to some list, next iteration it will see PB1 and mark that with E since it's not on that some list. – MooingRawr Sep 07 '16 at 15:55
  • Loop over your master list, starting with the _second_ item. Compare the first string in the item to the first string in the previous item. If they differ, add an 'E' to the first string. Repeat for the second and third strings. – John Gordon Sep 07 '16 at 16:01
  • @JohnGordon: that won't tell you when something is the *last* occurrence, only that the value changed. Since the 3rd item is alternating each row, your method can't find the last occurrence just by comparing to the preceding row. – Martijn Pieters Sep 07 '16 at 16:03
  • Are the columns of the list distinct, or can they be mixed? If they can be mixed, do you want the absolute last occurrence of each element, or the last occurrence in each column? See the discussion under @MartijnPieters' question for more info. We made different assumptions on the subject. – Mad Physicist Sep 07 '16 at 16:23

3 Answers3

2

Process your input list in reverse, then mark the first occurrence of any value. You can use a list of sets to track what values you've already seen. Reverse the output list you build when you are done:

seensets = [set() for _ in inputlist[0]]
outputlist = []
for entry in reversed(inputlist):
    newentry = []
    for value, seen in zip(entry, seensets):
        newentry.append(value + 'E' if value not in seen else value)
        seen.add(value)
    outputlist.append(tuple(newentry))
outputlist.reverse()

Demo:

>>> seensets = [set() for _ in inputlist[0]]
>>> outputlist = []
>>> for entry in reversed(inputlist):
...     newentry = []
...     for value, seen in zip(entry, seensets):
...         newentry.append(value + 'E' if value not in seen else value)
...         seen.add(value)
...     outputlist.append(tuple(newentry))
...
>>> outputlist.reverse()
>>> pprint(outputlist)
[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

If you are not looking for lightning speed here, you could do the following:

  1. Flatten the list using https://stackoverflow.com/a/952952/2988730
  2. Find the unique elements
  3. Find the index of the last occurrence of each unique element (by reversing the list)
  4. Update the element
  5. Reshape the flattened list back using https://stackoverflow.com/a/10124783/2988730

Here is a sample implementation:

# 1
flat = list(reversed([x for group in mylist for x in group]))
# 2
uniq = set(flat)
# 3, 4
for x in uniq:
    flat[flat.index(x)] += 'E'
# 5
mylist = list(zip(*[reversed(flat)]*3))

Result:

[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]
Community
  • 1
  • 1
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

Another approach that gathers keeps adding the indexes so you end up with the indexes for the last occurrence, itertools.product will also create the initial list for you:

from itertools import product

def last_inds(prod):
    # the key/value will be overwritten so we always keep the last seen
    return {ele: (i1, i2) for i1, prod in enumerate(prod) for i2, ele in enumerate(prod)}

prod = list(product(*(list1, list2, list3)))

# use the indexes to change the last occurrences.
for r, c in last_inds(prod).values():
    lst = list(prod[r])
    lst[c] += "E"
    prod[r] = tuple(lst)

Which gives you the expected output:

[('PC0', 'PB0', 'PA0'),
 ('PC0', 'PB0', 'PA1'),
 ('PC0', 'PB1', 'PA0'),
 ('PC0E', 'PB1', 'PA1'),
 ('PC1', 'PB0', 'PA0'),
 ('PC1', 'PB0', 'PA1'),
 ('PC1', 'PB1', 'PA0'),
 ('PC1E', 'PB1', 'PA1'),
 ('PC2', 'PB0', 'PA0'),
 ('PC2', 'PB0E', 'PA1'),
 ('PC2', 'PB1', 'PA0E'),
 ('PC2E', 'PB1E', 'PA1E')]

On my timings it is the fastest approach using your data.

In [37]: %%timeit
prod = list(product(*(list1, list2, list3)))
m(prod)
   ....: 
10000 loops, best of 3: 20.7 µs per loop

In [38]: %%timeit
prod = list(product(*(list1, list2, list3)))
for r, c in last_inds(prod).values():
    lst = list(prod[r])
    lst[c] += "E"
    prod[r] = tuple(lst)
   ....: 

100000 loops, best of 3: 12.2 µs per loop

Where m is:

def m(inputlist):
    seensets = [set() for _ in inputlist[0]]
    outputlist = []
    for entry in reversed(inputlist):
        newentry = []
        for value, seen in zip(entry, seensets):
            newentry.append(value + 'E' if value not in seen else value)
            seen.add(value)
        outputlist.append(tuple(newentry))
    outputlist.reverse()
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321