1

I have a large list that contain (userid , start value , end value) and now I have to make pair of users with the condition that the start value +1 or - 1 of 1st user in pair = end value of 2nd user in pair and the opposite should also be true. i.e. start value + or - 1 of 2nd user = end value of 1st user also the pair can't contain same user more then once I can make a simple problem but it's not efficient and will take forever if the number of users are in thousands and I want it to do everything in as little time as possible

I have a basic code that iterates over the list again and again for every item and removes the once whose pair has been formed but I know it's not efficient at all.

I don't know if it will help or not but all that data will come from firebase. I am really new to firebase also so if there's some features in firebase that can solve this type of problem then please suggest that also.

edit: yes actually the possible pair will be +-1 but if there are no pairs left with this criteria then if possible i would really like to make pairs with higher numbers instead of 1 but i don't know how to do it and this is what i tried

def find_matching_pair(user):
    global user_list
    
    for potential_match in user_list:
        if (potential_match['exit'] == user['boarding'] + 1 or
            potential_match['exit'] == user['boarding'] -1 ) and ( user['exit']== potential_match['boarding'] +1 or user['exit']== potential_match['boarding'] -1):
            if user in user_list:
                user_list.remove(user)
                if potential_match in user_list:
                    user_list.remove(potential_match)
                    return ([user['name'],user['exit']],[potential_match['name'],potential_match['exit']])
                else:
                    continue
    return None

import random

# Generate random data
user_list = []
for i in range(100):
    name = f"Person {i+1}"
    boarding = random.choice([1,2,3,4,5,6])
    exit = random.choice([1,2,3,4,5,6])
    while exit == boarding:
        exit = random.choice([1,2,3,4,5,6])
    
    user_list.append({'name': name, 'boarding': boarding, 'exit': exit})
for x in user_list:
    print(x)

paired_users = [] # [([name, exit][name , exit]),  .....]
paired = []
suggested = []
# Matching algorithm
for user in user_list:
    if user['name'] not in paired:
        matching_user = find_matching_pair(user)
        if (matching_user is not None ):

            if ((matching_user[0], matching_user[1]) or (matching_user[1], matching_user[0]) not in paired_users):
                paired_users.append(matching_user)
                paired.append(matching_user[0][0])
                paired.append(matching_user[1][0])
                print(f"Pair: {matching_user}" , len(paired))
Barmar
  • 741,623
  • 53
  • 500
  • 612
Elite
  • 29
  • 4
  • 1
    Turn the list into a dictionary so you can look up the start values directly instead of searching. – Barmar Aug 29 '23 at 19:08
  • 1
    Please show what you tried. We'll help you fix it, we won't write it for you. – Barmar Aug 29 '23 at 19:09
  • If you find a matching pair, would you want to keep searching to see if some other pair makes a _better_ match? i.e. if you have 100 users, pairing them up in one arrangement might make 90 paired users and 10 left over, but pairing them up in a different way might make 98 paired users and only 2 left over. – John Gordon Aug 29 '23 at 19:12
  • As @Barmar said, I'd 1st turn the list into a dictionary `{(start value, end value): [list of userids]}` ; it would then be easy to find all the userids that "match" a given userid. – Swifty Aug 29 '23 at 19:33
  • @Swifty i don't understand how can i turn it into a dictionary, like i am already storing the data as a dictionary {'name': name, 'boarding': boarding, 'exit': exit} and every user will have a different boarding and exit point so how can i use a [list of userids] – Elite Aug 29 '23 at 19:37
  • You mentioned a LIST of (userid, start, end) in your question... But anyway, this should do it: `rev_dic = {} ; for user in user_list: rev_dic.setdefault((user['boarding'], user['exit']), []).append(user['name'])` – Swifty Aug 29 '23 at 19:38
  • @Swifty yes i know like i just tried to do what i could and i tried using a dictionary but it didn't help. i might be using a wrong approach with dictionary i am really new to this so i don't know what could be the best possible solution for this – Elite Aug 29 '23 at 19:43
  • The trick is, when you want to cross 2 huge lists and want to avoid a cartesian product (hint: you ALWAYS want to avoid that), you turn 1 of the 2 lists into a dict (aka: hashing the list) and code the logic from there. This usually reduces the complexity from O(np) to O(n+p); anyway, once you've created your "reversed dict", find all matching candidates to a user with (start=x, end=y) is as simple as joining all the values for (y-1, x-1), (y-1, x+1), (y+1, x-1) and (y+1, x+1). – Swifty Aug 29 '23 at 19:44
  • @Swifty yes thankyou for the suggestion i will look into it. – Elite Aug 29 '23 at 19:48
  • @Swifty yes actually this is not the final code i am working on a personal project so i was just trying what i knew but the names will be unique or ill just use some userid – Elite Aug 29 '23 at 19:53
  • In addition, you'd better turn your list of dicts into a single dict {'name': (x, y)} in addition to the reversed dict, or, if it's you creating the data, create a single dict directly and not a list of dicts; it will make further treatment easier (provided the 'names' are unique, of course). Note that what @Barmar and I said will make the `find_matching_pair` function more efficient; what @JohnGordon asked remains valid. – Swifty Aug 29 '23 at 19:58
  • @Swifty user_list = {1:(2,3),2:(2,3), 3:(4,3)} # Matching algorithm rev_dic = {} for user in user_list: rev_dic.setdefault((user_list[user][0], user_list[user][1]), []).append(user) for x,y in rev_dic.items(): print(x , y) .........is this right – Elite Aug 29 '23 at 20:07
  • If user_list is now a dict instead of a list, the way you create rev_dic seems right. Hmm that said, `rev_dic.setdefault(user_list[user], []).append(user)` would be shorter ;) – Swifty Aug 29 '23 at 20:19

2 Answers2

0

If you're given a user_list = [{'name': name, 'boarding': boarding, 'exit': exit}, ...]:

# Reorganizing the data into 2 dictionaries (time complexity should be O(n))
user_dict, time_dict = {}, {}
for user in user_list:
    user_dict[user['name']] = (user['boarding'], user['exit'])
    time_dict.setdefault((user['boarding'], user['exit']), []).append(user['name'])

If your data is already in the user_dict form:

# Creating a reversed dictionary (the complexity is still O(n)):
time_dict = {}
for user, time in user_dict.items():
    time_dict.setdefault(time, []).append(user)

Then we revisit get_matching_users function, which should be in about constant time complexity:

def find_matching_users(user):
    (x, y) = user_dict[user]
    return [username    for time in ((y-1, x-1), (y-1, x+1), (y+1, x-1), (y+1, x+1))
                        for username in time_dict.get(time, [])]

After that you have to code the logic around it: I'd suggest going through all users in user_dict, choosing each time (how is up to you) a candidate from find_matching_users(user), logging the matching pair (and adding the 2 users to some already_parsed set), and go on (use set difference to filter out already parsed users).

Edit: thinking a bit more about it, your 'matches' relation is symmetric, and your problem can be rewritten into a graph one, perhaps even a classic one (though I'm not versed enough in graphs to know for sure); in conclusion, you might want to look up some graphs algorithms.

Swifty
  • 2,630
  • 2
  • 3
  • 21
  • This post (or similar ones) might help: https://stackoverflow.com/questions/33434718/how-to-group-a-list-of-connected-pairs – Swifty Aug 29 '23 at 20:44
0

@Swifty i don't understand how can i turn it into a dictionary, like i am already storing the data as a dictionary {'name': name, 'boarding': boarding, 'exit': exit} and every user will have a different boarding and exit point so how can i use a [list of userids]

Objects in Python are not stored in dictionaries or in lists or in tuples or in any other container object. Python is not like C or C++. The only thing that container objects hold are references to other objects. The actual storage of the objects is elsewhere, and it's fully automatic. You don't have to worry about it.* So, what that means is, you can put references to the same object into as many different dictionaries or lists or whatever as you like, and it doesn't cause any problem.

What Swifty is suggesting (I think) is that you create a second dictionary, in addition to the dictionary that you already have, to be used as an inverted index. In a nutshell, if you have objects with attributes, then an inverted index is a dictionary (or something like) in which the possible values of some attribute are the keys, and the values in the dictionary are lists of objects for which the value of that attribute are equal to the given key.


* Not 100% true. You'll have to know a little bit about Python's object storage management if you create cyclic chains of object references, but that's probably something you don't have to worry about just yet.

Solomon Slow
  • 25,130
  • 5
  • 37
  • 57
  • That's indeed what I meant; I'll note the "official" `inverted index` name for future reference. – Swifty Aug 30 '23 at 05:43