I have a nested containing millions of other lists (using tuples atm). For each list, an element may only be included once. I thought that each list was unique, so I needed them all, but I recently realized my nested list contained pairs like this:
listA = ('77', '131', '212', '69')
listB = ('69', '212', '131', '77')
While listA and listB are unique, one is just the reversed duplicate of the other. I need to retain every unique combination because order is important.
listC = ('131', '69', '77', '212')
So listC, while using the same elements, is considered unique because of the order and would need to be retained.
I can cut my nested list down by a huge amount (about half) if I remove all the duplicates, but I can't find a way to do this in a time efficient way.
Because it may be best to eliminate these reversed duplicates before they are even added to my nested list, below I've included the class I use to make the list.
class Graph(object):
def __init__(self, graph_dict=None):
""" Initializes a graph object.
If no dictionary or None is given,
an empty dictionary will be used. """
if graph_dict == None:
graph_dict = {}
self.__graph_dict = graph_dict
def find_all_paths(self, start_vertex, end_vertex, path=[]):
""" Find all paths from start_vertex to end_vertex in graph """
graph = self.__graph_dict
path = path + [start_vertex]
if start_vertex == end_vertex:
return [path]
if start_vertex not in graph:
return []
paths = []
for vertex in graph[start_vertex]:
if vertex not in path:
extended_paths = self.find_all_paths(vertex, end_vertex, path)
for p in extended_paths:
if len(p) >= 2:
p = tuple(p)
paths.append(p)
return paths
graph = Graph(vertexGraphDict)
nestedList= graph.find_all_paths(begin, end)
vertexGraphDict is just a dictionary of vertices as keys with the values being a list of the other vertices to which it is connected.
I have tried to eliminate the reversed duplicates using the following method:
reversedLists = []
for item in nestedLists:
if item in reversedLists:
nestedLists.remove(item)
else:
revItem = item[::-1]
reversedLists.append(revItem)
This method is very slow. I have also tried revItem = list(reversed(item)) after removing the line p = tuple(p) in my class; very slow as well. Trying these methods during the list production saves time overall but does not speed up the elimination process, which is key.