1

I have a nested containing millions of other lists (using tuples atm). For each list, an element may only be included once. I thought that each list was unique, so I needed them all, but I recently realized my nested list contained pairs like this:

listA = ('77', '131', '212', '69')
listB = ('69', '212', '131', '77')

While listA and listB are unique, one is just the reversed duplicate of the other. I need to retain every unique combination because order is important.

listC = ('131', '69', '77', '212')

So listC, while using the same elements, is considered unique because of the order and would need to be retained.

I can cut my nested list down by a huge amount (about half) if I remove all the duplicates, but I can't find a way to do this in a time efficient way.

Because it may be best to eliminate these reversed duplicates before they are even added to my nested list, below I've included the class I use to make the list.

class Graph(object):

    def __init__(self, graph_dict=None):
        """ Initializes a graph object.
            If no dictionary or None is given,
            an empty dictionary will be used. """
        if graph_dict == None:
            graph_dict = {}
        self.__graph_dict = graph_dict

    def find_all_paths(self, start_vertex, end_vertex, path=[]):
        """ Find all paths from start_vertex to end_vertex in graph """
        graph = self.__graph_dict
        path = path + [start_vertex]        
        if start_vertex == end_vertex:
            return [path]
        if start_vertex not in graph:
            return []
        paths = []
        for vertex in graph[start_vertex]:
            if vertex not in path:
                extended_paths = self.find_all_paths(vertex, end_vertex, path)
                for p in extended_paths:
                    if len(p) >= 2:
                        p = tuple(p)
                        paths.append(p)
        return paths

graph = Graph(vertexGraphDict)
nestedList= graph.find_all_paths(begin, end)

vertexGraphDict is just a dictionary of vertices as keys with the values being a list of the other vertices to which it is connected.

I have tried to eliminate the reversed duplicates using the following method:

reversedLists = []
for item in nestedLists:
    if item in reversedLists:
        nestedLists.remove(item)
    else:
        revItem = item[::-1] 
        reversedLists.append(revItem)

This method is very slow. I have also tried revItem = list(reversed(item)) after removing the line p = tuple(p) in my class; very slow as well. Trying these methods during the list production saves time overall but does not speed up the elimination process, which is key.

sloganq
  • 23
  • 4

2 Answers2

3

You can build an OrderedDict with the key being the tuple in reversed order only if the last item is lower than the first item, and the value being the tuple itself, and then get the list of values of the OrderedDict:

from collections import OrderedDict
l = [
    ('77', '131', '212', '69'),
    ('69', '212', '131', '77'),
    ('1', '2', '3', '4'),
    ('4', '1', '2', '3'),
    ('4', '3', '2', '1')
]
list(OrderedDict((t[::-1] if t[-1] < t[0] else t, t) for t in l).values())

Or if you're using Python 3.7 or later versions, where dict keys are ordered, you can use a dict in place of OrderedDict:

list({t[::-1] if t[-1] < t[0] else t: t for t in l}.values())

This returns:

[('69', '212', '131', '77'), ('4', '3', '2', '1'), ('4', '1', '2', '3')]
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • That's some good python magic there. Why did it not return (4, 1, 2, 3)? It's a uniquely ordered combination so I would need to save that. – sloganq Mar 26 '19 at 21:42
  • This is because with Python dicts, values of the same key as an earlier value would overwrite the earlier value, retaining only the latter ones. – blhsing Mar 26 '19 at 21:44
  • I can't use a solution that eliminates unique combinations. – sloganq Mar 26 '19 at 21:46
  • I now see what you mean by reversed duplicates. I've updated my answer accordingly then. – blhsing Mar 26 '19 at 21:53
  • 1
    Great magic, thank you! This reduced my search and eliminate routine time by 99.9%. It only takes 10 seconds now! – sloganq Mar 27 '19 at 01:12
0

My approach is to switch each tuple to a list, reversing it, switching it back to a tuple and removing the (reversed) tuple from the list if it's part of it.

l = [
    ('77', '131', '212', '69'),
    ('69', '212', '131', '77'),
    ('1', '2', '3', '4'),
    ('4', '1', '2', '3'),
    ('4', '3', '2', '1')
]

for t in l:
    lr = list(t)
    lr.reverse()
    tr = tuple(lr)
    if tr in l:
        l.remove(tr)

print(l)

I don't know how fast this will compute but the output is right here.

[('77', '131', '212', '69'), ('1', '2', '3', '4'), ('4', '1', '2', '3')]
g4g4
  • 49
  • 4
  • Having my nested list filled with lists instead of tuples, this method cuts off about 33% of the processing time, without the conversion bit. – sloganq Mar 26 '19 at 23:53
  • I'm glad to hear that. So my answer was useful. I wondered why you chose to use tuples because there are more (useful) methods for lists than for tuples. – g4g4 Mar 27 '19 at 01:07
  • I used tuples to save on memory and improve handling time. I'm pickling the nestedList file into and out of some of my program functions. I wish I knew of a better way. I'll be looking into those improvements later. – sloganq Mar 27 '19 at 01:54
  • Thanks for the information! For the last weeks I am educating myself to become more familiar with data science aspects. This looks like an issue of that field. Do you work as a data scientist or data analyst? I'm just curious, looking for like-minded. – g4g4 Mar 27 '19 at 08:50