0

Suppose I have a list of lists

 record = [['g1','g2','g3'],['g2','g4'],['g1','g3','g5'],['g2','g3','g5'],['g1','g4']]

and I have list of tuples

list1 = [('g1','g2'),('g1','g3'),('g1','g4'),('g1','g5'),('g2','g3'),('g2','g4'),('g2','g5'),('g3','g4'),('g3','g5'),('g4','g5')]

now how many times ('g1','g2') occurs in record ? solution should be 1 because ('g1','g2') is present only in ['g1','g2','g3']

I can change list of tuples to list of lists. Is there any easy approach rather than brute force ? because my list of lists may contains 1000k items

Julien
  • 13,986
  • 5
  • 29
  • 53

2 Answers2

1

It's not pretty, but it works:

record = [['g1','g2','g3'],['g2','g4'],['g1','g3','g5'],['g2','g3','g5'],['g1','g4']]
pattern = [('g1','g2'),('g1','g3'),('g1','g4'),('g1','g5'),('g2','g3'),('g2','g4'),('g2','g5'),('g3','g4'),('g3','g5'),('g4','g5')]

res = {}
for p in pattern:
    res[str(p)] = 0
    for r in record:
        if set(p).issubset(set(r)):
            res[str(p)] += 1

print(res)

Edit:
10^6 items? (well this is not going to work then...)

Olian04
  • 6,480
  • 2
  • 27
  • 54
0

Consider the items in your list of lists, g1, g2, ... as vertices of an undirected graph. Go through your list of lists and build the graph. Everytime g1 and g2 occur in the same sub-list, increase the weight of the g1 <-> g2 by 1. Then, the number you are looking for is the weight of the edge incident on the elements of the tuple.

This assumes that the tuples will always have two elements. If the tuples are of arbitrary size, in addition to the sub-lists being arbitrary, then this problem reduces to finding multiple subgraph isomorphisms, each of which is NP-Complete. See this: https://stackoverflow.com/a/5279581/1749870

Community
  • 1
  • 1
  • can you explain the second paragraph. It is not clear to me – Sourasekhar Banerjee Nov 10 '16 at 19:05
  • What I meant is, this approach is practical only if your tuples are length 2. This will allow you to directly check edge weights. For bigger tuples, you will have to check whether the entire subgraph formed from the elements exists in the bigger graph. From your example, if a tuple in `list1' is `(g1, g2, g3, g5)`, then you will have to check that the complete subgraph formed by these 4 nodes exists in the graph created from `record`, and if it does, find the smallest edge weight. But finding whether a graph `G` is a subgraph of another graph `T` is an NP-Complete problem mentioned above. – TheDarkKnight Nov 12 '16 at 01:21