Algorithmic approach for matching list of labels with many other lists of labels

Question

I'm developing a matching app in android, in which each user enters free text about himself and labels are being generated according to the words he wrote.

Then, a user can see a list of users with the best matching precentage to his own labels.

Right now I'm using the following naive algorithm:

1. For each label L of my labels 

   1.1 For each other user U in the app

       1.1.1 For each UL of that user's labels

          1.1.1.1 Check if L = UL (check if my current label equals that user's current label)

This naive algorithm runs at O(n^3) and is obviously extremely slow.

(I'm wondering how dating applications generate these matchings so quickly, they must have some really good algorithm).

Any better approaches?

Off topic: I wonder how you generate labels from the free text... :o — shole, Jun 30 '16 at 03:05
Hmm, I'm no programming guru, I do it in a very naive way: I have a method which splits the user's enetered text by " " to convert it into words, then I have a a big 'if' and 'switch case' block which checks if any pre-defined words appear there and generates labels accordingly. f.e: * if (words.contains("wise") || words.contains("inteligent") || words.contains("smart")) * labels+="smart"; etc. — E.Mich, Jun 30 '16 at 08:36
Not answering your OP, but you may interested in fuzzy logic / fuzzy search in the label generating function — shole, Jun 30 '16 at 08:58

Lior Kogan · Answer 1 · 2016-06-30T10:01:17.297

First, your algorithm in not O(n^3), as you are counting different things: You should label the number of users as u and the maximal number on labels per users as l. The complexity of your algorithm is O(u*l^2) - for each user, or O(u^2 * l^2) for all users.

If you'll pre-sort the labels for each user (O(u * l * log(l))), you can find the number of identical labels in two sorted lists in O(l):

For each user u1
- For each user u2
  - Calc number of identical labels

Would now take O(u^2*l) - for all users.

The total complexity would be O(max(u^2 * l , u * l * log(l))) for all users.

Note also that the algorithm is not complete, as you should take into account the complexity of reporting the results as well.

You are right. right now I have as many labels as users so I related to it as n^3 but it will be changing as the app will grow. I will see if i can implement your suggestion, thanks — E.Mich, Jun 30 '16 at 18:47

score 0 · Answer 2 · edited May 23 '17 at 10:28

0

Any better approaches?

Database tables with indices.

If you insist to do it by yourself, a first big improvement is a hash table.

edited May 23 '17 at 10:28

Community

1
1

answered Jun 30 '16 at 00:01

deviantfan

11,268
3
32
49

astraujums · Answer 3 · 2016-06-30T10:43:27.563

0

Have a map of labels in your system to lists of users that have the label.
Generate the list of labels for your new user.
Find the lists of users matching those labels.
Generate a list of users that appear most often in the found lists.

The complexity should be LU * log(L) + SU * LU * log(SU * LU), where LU is the number of labels of the new user, L is the number of different labels in the system, SU is the number of users sharing at least one label with the new user.

edited Jun 30 '16 at 10:43

answered Jun 30 '16 at 10:09

astraujums

709
8
20

You should avoid using variables such as SU when calculating complexity, as SU = f(input) – Lior Kogan Jun 30 '16 at 10:29
1

How would you implement step 4? would you build a map user->count? It would take O(LU*SU) to walk over the lists. – Lior Kogan Jun 30 '16 at 10:33
Concatenate lists, sort the result by user, count ocurrances of users, build list of users with number of their appearances. Then sort that list by number of appearances. You are right about the complexity. Edited the answer to reflect that. – astraujums Jun 30 '16 at 10:42

Algorithmic approach for matching list of labels with many other lists of labels

3 Answers3