-1

I have a 2d list [index, value] with repeated indexes. I need to select unique indexes with most often occurring value or if values seen equal number of times - the last one.

[
  [0,-1],
  [1, 0],
  [1, 1],
  [2, 1],
  [2,-1],
  [2, 1],
]
 =>
[
  [0,-1],
  [1, 1], # last seen
  [2, 1], # most often seen
]

I can use numpy or any other popular lib instead if it makes it easier

Boppity Bop
  • 9,613
  • 13
  • 72
  • 151

3 Answers3

1

You can do like this,

from itertools import groupby
from collections import Counter

result = []
for index, lst in groupby(l, key=lambda x:x[0]):
     lst = [i[1] for i in lst]
     if len(lst) == len(set(lst)) or len(set(Counter(lst).values())) == 1:
         item = lst[-1]
     else:
         item = max(set(lst), key=lst.count)
     result.append([index, item])

In [160]: result
Out[160]: [[0, -1], [1, 1], [2, 1]]

len(lst) == len(set(lst)) -> Idenity if the list doesn't have any replication. len(set(Counter(lst).values())) == 1 -> Handing the special condition mentioned by @sajad.

Rahul K P
  • 15,740
  • 4
  • 35
  • 52
  • Your conversion to `set` is wrong and doesn't make sense here; imagine `[[2, 1], [2, 1], [2, 2], [2, 2]]`: Your output is supposed to pick the most frequent, and the last one in case of a tie. – Sajad Sep 03 '22 at 18:18
  • 1
    @RahulKP it would be [2, 1] if I'm correct. – Sajad Sep 03 '22 at 18:26
  • 1
    @RahulKP all I can say is you're losing the initial order by converting to a set – Sajad Sep 03 '22 at 18:28
  • @Sajad `lst` will give you `[1, 1, 2, 2]` and `set(lst)` will be `{1, 2}` so the comparison will succeed (`if len(lst) == len(set(lst))`). Then it's goes to the else conditions. Ideally, I wanted to get the last element if there is no duplication. – Rahul K P Sep 03 '22 at 18:29
  • Sajad is right - if you run your code with `l = [[2, 1], [2, 1], [2, 2], [2, 2]]` you get `[2,1]` but correct result is `[2,2]` – Boppity Bop Sep 03 '22 at 18:30
  • @RahulKP you lose information when deduplicating. read the question again – Sajad Sep 03 '22 at 18:31
  • 1
    @Sajad You are right, I was considering another line, Thanks for the mention. Let me correct it. – Rahul K P Sep 03 '22 at 18:33
  • @RahulKP I modified your answer to make it work; you can check it out and edit yours accordingly; as I picked up on your answer, please edit your answer so I can remove my downvote – Sajad Sep 03 '22 at 18:47
  • @Sajad Updated my answer, Once again, Thanks for the mention. – Rahul K P Sep 03 '22 at 18:54
  • @RahulKP I can't follow your logic again, but I suppose it won't work again as you're still using a set? The last item is meaningless, so [-1] won't work. What you want is the last item in a series of equally frequent values. For this, your ordering must be preserved. I hope I make sense to you. – Sajad Sep 03 '22 at 18:59
  • @Sajad You just check yourself what you are saying and checking. In the code, I am getting the last element from `lst` which is not altered. Because set operation doing in the condition operation itself. It won't change anything in the `lst`. So, I don't undersnad where did you find the order change in `lst` ? – Rahul K P Sep 03 '22 at 20:49
  • @RahulKP https://stackoverflow.com/questions/9792664/converting-a-list-to-a-set-changes-element-order – Sajad Sep 03 '22 at 21:53
  • `[1, 1, 2, 2, 3]`: most frequent elements -> [1, 2] -> the last, most common value seen -> 2 – Sajad Sep 03 '22 at 21:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247765/discussion-between-sajad-and-rahul-k-p). – Sajad Sep 03 '22 at 22:04
1

I would keep it simple instead, and note that the ordering in the answer matters; so it's necessary to preserve it along with other bookkeeping structures.

from collections import defaultdict

l = [
  [2, 1],
  [2, 1],
  [2, 2],
  [2, 2],
  [2,-1],
  [2,-1]
]

result = []

groups = defaultdict(list)
for index, value in l:
    groups[index].append(value) # keep the list (ordered series) of values for each index
    
for index, group in groups.items():
    best_count = 0
    best_value = None
    counts = defaultdict(int)
    for value in group:
        counts[value] += 1 # count each value for the index
        # we look for the most frequest value, and in case of ties, 
        # we prefer the one which has the last occurrence in the 
        # series (list) of values
        if counts[value] >= best_count: 
            best_value = value
            best_count = counts[value]
    result.append([index, best_value])

print(result)
Sajad
  • 182
  • 1
  • 11
-1
long_list=[[0,-1],[0,-1],[1, 0],[1, 1],[2, 1],[2,-1],[2, 1],]
short_list=[]
for element in long_list:
    if element not in short_list:
        short_list.append(element)
print(short_list)

Output: [[0, -1], [1, 0], [1, 1], [2, 1], [2, -1]]