I am currently in the process of trying to form an algorithm that will calculate the relevance
of a user
to another user
based on certain bits of data.
Unfortunately, my Maths skills have deteriorated since leaving school almost a decade ago, and as such, I am very much struggling with this. I have found an algorithm online that pushes 'hot' posts to the top of a newsfeed and figure this is a good place to start. This is the algorithm/calculation I found online (in MySQL):
LOG10(ABS(activity) + 1) * SIGN(activity) + (UNIX_TIMESTAMP(created_at) / 300000)
What I am hoping to do is adapt the above concept to work with the data and models I have in my own application. Consider this user object (trimmed down):
{
"id": 1
"first_name": "Joe",
"last_name": "Bloggs",
"counts": {
"connections": 21,
"mutual_connections": 16
},
"mutual_objects": [
{
"created_at": "2017-03-26 13:30:47"
},
{
"created_at": "2017-03-26 14:25:32"
}
],
"last_seen": "2017-03-26 14:25:32",
}
There are three bits of relevant information above that need to be considered in the algorithm:
mutual_connections
mutual_objects
but taking into account that older objects should not drive up the relevance as much as newer objects, hence thecreated_at
field.last_seen
Can anyone suggest a fairly simple (if that's possible) way of doing this?
This was my idea, but in all honesty, I have no idea what it is doing so I cannot be sure if it is a good solution and I have also missed out last_seen
as I could not find a way to add this:
$mutual_date_sum = 0;
foreach ($user->mutual_objects as $mutual_object) {
$mutual_date_sum =+ strtotime($mutual_object->created_at);
}
$mutual_date_thing = $mutual_date_sum / (300000 * count($user->mutual_objects));
$relevance = log10($user->counts->mutual_connections + 1) + $mutual_date_thing;
Just to be clear, I am not looking to implement some sort of government level AI, 50,000 line algorithm from a mathematical genius. I am merely looking for a relatively simple solution that will do the trick for the moment.
UPDATE
I have had a little play and have managed to build the following test. It seems the mutual_objects
very much carries the weight in this particular algorithm as I would expect to see users 4 and 5 higher up the results list given their large number of mutual_connections
.
I don't know if this makes it easier to amend/play with, but this is probably the best I can do. Please help if you have any suggestions :-)
$users = [
[
'id' => 1,
'mutual_connections' => 15,
'mutual_objects' => [
[
'created_at' => '2017-03-26 14:25:32'
],
[
'created_at' => '2017-03-26 14:25:32'
],
[
'created_at' => '2017-02-26 14:25:32'
],
[
'created_at' => '2017-03-15 14:25:32'
],
[
'created_at' => '2017-01-26 14:25:32'
],
[
'created_at' => '2017-03-26 14:25:32'
],
[
'created_at' => '2016-03-26 14:25:32'
],
[
'created_at' => '2017-03-26 14:25:32'
]
],
'last_seen' => '2017-03-01 14:25:32'
],
[
'id' => 2,
'mutual_connections' => 2,
'mutual_objects' => [
[
'created_at' => '2016-03-26 14:25:32'
],
[
'created_at' => '2015-03-26 14:25:32'
],
[
'created_at' => '2017-02-26 14:25:32'
],
[
'created_at' => '2017-03-15 14:25:32'
],
[
'created_at' => '2017-01-26 14:25:32'
],
[
'created_at' => '2017-03-26 14:25:32'
],
[
'created_at' => '2016-03-26 14:25:32'
],
[
'created_at' => '2016-03-26 14:25:32'
],
[
'created_at' => '2016-03-26 14:25:32'
],
[
'created_at' => '2017-03-15 14:25:32'
],
[
'created_at' => '2017-02-26 14:25:32'
],
[
'created_at' => '2017-03-15 14:25:32'
],
[
'created_at' => '2017-01-26 14:25:32'
],
[
'created_at' => '2017-03-12 14:25:32'
],
[
'created_at' => '2016-03-13 14:25:32'
],
[
'created_at' => '2017-03-17 14:25:32'
]
],
'last_seen' => '2015-03-25 14:25:32'
],
[
'id' => 3,
'mutual_connections' => 30,
'mutual_objects' => [
[
'created_at' => '2017-02-26 14:25:32'
],
[
'created_at' => '2017-03-26 14:25:32'
]
],
'last_seen' => '2017-03-25 14:25:32'
],
[
'id' => 4,
'mutual_connections' => 107,
'mutual_objects' => [],
'last_seen' => '2017-03-26 14:25:32'
],
[
'id' => 5,
'mutual_connections' => 500,
'mutual_objects' => [],
'last_seen' => '2017-03-26 20:25:32'
],
[
'id' => 6,
'mutual_connections' => 5,
'mutual_objects' => [
[
'created_at' => '2017-03-26 20:55:32'
],
[
'created_at' => '2017-03-25 14:25:32'
]
],
'last_seen' => '2017-03-25 14:25:32'
]
];
$relevance = [];
foreach ($users as $user) {
$mutual_date_sum = 0;
foreach ($user['mutual_objects'] as $bubble) {
$mutual_date_sum =+ strtotime($bubble['created_at']);
}
$mutual_date_thing = empty($mutual_date_sum) ? 1 : $mutual_date_sum / (300000 * count($user['mutual_objects']));
$relevance[] = [
'id' => $user['id'],
'relevance' => log10($user['mutual_connections'] + 1) + $mutual_date_thing
];
}
$relevance = collect($relevance)->sortByDesc('relevance');
print_r($relevance->values()->all());
This prints out:
Array
(
[0] => Array
(
[id] => 3
[relevance] => 2485.7219150272
)
[1] => Array
(
[id] => 6
[relevance] => 2484.8647045837
)
[2] => Array
(
[id] => 1
[relevance] => 622.26175831599
)
[3] => Array
(
[id] => 2
[relevance] => 310.84394042139
)
[4] => Array
(
[id] => 5
[relevance] => 3.6998377258672
)
[5] => Array
(
[id] => 4
[relevance] => 3.0334237554869
)
)