0

My data is like this:

movies = [
    "movie 1",
    "movie 2",
    "movie 3",
    "movie 4",
    "movie 5",
    "movie 6",
    "movie 7",
    "movie 8",
    "movie 9",
    "movie 10",
    "movie 11",
    "movie 12",
    "movie 13",
    "movie 14",
    "movie 15",
]
list_of_tuples = [
    ("movie 1", "movie 3"),
    ("movie 3", "movie 6"),
    ("movie 6", "movie 9"),
    ("movie 9", "movie 12"),
    ("movie 12", "movie 15"),
    ("movie 2", "movie 4"),
    ("movie 4", "movie 7"),
    ("movie 8", "movie 10"),
    ("movie 10", "movie 5"),
    ("movie 14", "movie 13"),
    ("movie 11", "movie 13"),
]

Output should be like this:

result_dict = {'movie 1' : ['movie 1' , 'movie 3', 'movie 6', 'movie 9', 'movie 12', 'movie 15'],
               'movie 2' : ['movie 2', 'movie 4', 'movie 7'],
               'movie 3' : ['movie 1' , 'movie 3', 'movie 6', 'movie 9', 'movie 12', 'movie 15'],
                ....}

Here elements in tuples are same so 'movie 1' is similar to 'movie 3' and 'movie 3' is similar to 'movie 6' and 'movie 6' is to 'movie 9' and 'movie 9' to 'movie 12' and 'movie 12' to ' movie 15'.

I want to get a dictionary which has all the similar items as values.

I have tried like this, but I am not getting result:

result_dict = {movie : list() for movie in movies}

for tup in list_of_tuples:
  mov1, mov2 = tup

  result_dict[mov1].append(mov2)
  result_dict[mov2].append(mov1)

  for x in result_dict[mov2]:
    if x not in result_dict[mov1]:
    result_dict[mov1].append(x)
  
  for x in result_dict[mov1]:
    if x not in result_dict[mov2]:
      result_dict[mov2].append(x )

Please help me transform this with minimum time complexity.

Thanks in advance.

Thanks to @James Lin for helping to get this result, I am posting below how the code looks.


relationships = []
relationship = set()
for tuple_data in list_of_tuples:
    tuple_data = set(tuple_data)
    if tuple_data.intersection(relationship):
       relationship |= tuple_data
    else:
       # broken link
       relationship = set()
       relationship |= tuple_data
       relationships.append(relationship)

for idx in range(len(relationships)):
  relationships[idx] = list(relationships[idx])



result_dict = {movie : list() for movie in movies}

for key in result_dict.keys():
  for item in relationships:
    if key in item:
      result_dict[key] = item

and Output is:

{'movie 1': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3'], 'movie 2': ['movie 7', 'movie 4', 'movie 2'], 'movie 3': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3'], 'movie 4': ['movie 7', 'movie 4', 'movie 2'], 'movie 5': ['movie 10', 'movie 5', 'movie 8'], 'movie 6': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3'], 'movie 7': ['movie 7', 'movie 4', 'movie 2'], 'movie 8': ['movie 10', 'movie 5', 'movie 8'], 'movie 9': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3'], 'movie 10': ['movie 10', 'movie 5', 'movie 8'], 'movie 11': ['movie 14', 'movie 11', 'movie 13'], 'movie 12': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3'], 'movie 13': ['movie 14', 'movie 11', 'movie 13'], 'movie 14': ['movie 14', 'movie 11', 'movie 13'], 'movie 15': ['movie 1', 'movie 15', 'movie 12', 'movie 9', 'movie 6', 'movie 3']}

Please help me in understanding the complexity of this whole process. It would be also great to get it optimized.

Thanks

neel
  • 572
  • 4
  • 17

3 Answers3

1

Assuming your relationships are ordered top down, your description is not exactly clear, I am going to give a try to give you some hint:

You need to loop through the list_of_tuples to build the relationships between each element

relationships = []
relationship = set()
for tuple_data in list_of_tuples:
    tuple_data = set(tuple_data)
    if tuple_data.intersection(relationship):
       relationship |= tuple_data
    else:
       # broken link
       relationship = tuple_data
       relationships.append(relationship)

print(relationships)

This will print out:

[{'movie 15', 'movie 12', 'movie 6', 'movie 9', 'movie 3', 'movie 1'}, {'movie 2', 'movie 7', 'movie 4'}, {'movie 8', 'movie 5', 'movie 10'}, {'movie 11', 'movie 14', 'movie 13'}]

From this list you will be able generate your desired dictionary.

UPDATE: use set() to solve movie 11 relate to movie 13

UPDATE: you can first try to profile your code, eg. _ldap.get_option(_ldap.OPT_API_INFO) is slow after upgrading to MacOS Mojave

James Lin
  • 25,028
  • 36
  • 133
  • 233
  • This is a great help for me. Thanks @James Lin – neel Aug 12 '20 at 06:25
  • Movie 11 and 14 are related since 13 is common between them – neel Aug 12 '20 at 07:18
  • 1
    I have updated my answer to solve movie 11 and 13 problem – James Lin Aug 12 '20 at 09:29
  • I worked for me but and I got my result_dict but I think I am messing with time complexity here I am updating my question with final result. It would be great if you could direct me on time complexity. – neel Aug 12 '20 at 10:56
  • updated my answer, see if profiling your code helps to understand which part is time consuming, also noticed I updated my code in the else block to remove unnecessary line. – James Lin Aug 12 '20 at 20:39
  • how to handle if the list_of_tuples is not sorted eg: ```[("movie 1", "movie 2"), ("movie 3", "movie 4"), ("movie 4", "movie 1") ]``` – neel Aug 13 '20 at 09:56
  • You will have to add in extra steps to reconnect the broken relationships. Not that hard. – James Lin Aug 13 '20 at 10:41
0

You can use defaultdict to get this done.

from collections import defaultdict

list_of_tuples = [
    ("movie 1", "movie 3"),
    ("movie 3", "movie 6"),
    ("movie 6", "movie 9"),
    ("movie 9", "movie 12"),
    ("movie 12", "movie 15"),
    ("movie 2", "movie 4"),
    ("movie 4", "movie 7"),
    ("movie 8", "movie 10"),
    ("movie 10", "movie 5"),
    ("movie 14", "movie 13"),
    ("movie 11", "movie 13"),
]

result_dict = defaultdict(list)

for k ,v in list_of_tuples:

    #for value in the tuple, find out if this is already part of
    #the existing dictionary. If yes, get the key so you can
    #append to the key else start a new key item

    a = ''.join([x for x, y in result_dict.items() for z in y if z == k])

    #if found, above list comprehension will result in 1 element

    if a = '' : #if not found, then create a new list for key
        result_dict[k].append(v)

    else: # value is part of a key list, so append value to key list 
        result_dict[a].append(v)

result_dict = dict(result_dict)
print (result_dict)

The output of the above code is:

{'movie 1': ['movie 3', 'movie 6', 'movie 9', 'movie 12', 'movie 15'], 'movie 2': ['movie 4', 'movie 7'], 'movie 8': ['movie 10', 'movie 5'], 'movie 14': ['movie 13'], 'movie 11': ['movie 13']}

Is this what you are looking for

Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

You can always call dict(list_of_tuples) to get a corresponding dictionary for these tuples.

I don't know if this is THE most efficient timewise, but I get what you're trying to get in ~ O(n) with the following code:

from collections import defaultdict

movie_dict = dict(list_of_tuples)

index = defaultdict(list)
for key, value in movie_dict.items():
    index[key] += [value]
    index[value] += [key]

The output is:

defaultdict(list,
            {'movie 1': ['movie 3'],
             'movie 3': ['movie 1', 'movie 6'],
             'movie 6': ['movie 3', 'movie 9'],
             'movie 9': ['movie 6', 'movie 12'],
             'movie 12': ['movie 9', 'movie 15'],
             'movie 15': ['movie 12'],
             'movie 2': ['movie 4'],
             'movie 4': ['movie 2', 'movie 7'],
             'movie 7': ['movie 4'],
             'movie 8': ['movie 10'],
             'movie 10': ['movie 8', 'movie 5'],
             'movie 5': ['movie 10'],
             'movie 14': ['movie 13'],
             'movie 13': ['movie 14', 'movie 11'],
             'movie 11': ['movie 13']})

ETA: This gives you an index by movie of what it's similar to. If you want the movie equivalence classes, so to speak, you'll need to do some set operations. I'll add more info shortly.

Ray Johns
  • 768
  • 6
  • 14