5

This question is similar to a question asked about java, but i'm doing this in php so i don't think it qualifies as a duplicate.

I would like a way to generate a deterministic key when this function is called. the function should operate like a read through cache. if the key exists, retrieve the data. if not, call the function store the data, then return it.

here's what i have and it works, but im not sure if its safe and if its deterministic enough or even unique enough, since i have absolutely 0 understanding on these topics.

// $call = function being called $args = arguments to that function
// $force = force cache to bypassed, then updated
public function cachedCall($call,$args = [],$force = false)
{
    $cache = \App\App::getInstance()->cache;
    $key = md5($call) . md5(serialize($args));
    $res = $cache->get($key);
    if($res === -1 || $force){
        $res = call_user_func_array([$this,$call],$args);
        if(!empty($res) && $res !== false && $res !== 0 && !is_null($res)){
            $cache->set($key,$res,0); //never set empty data in the cache.
        }
    }
    return $res;
}

My question only pertains to the third line, where the key is calculated. you can see that it is calculated by the called function and the arguments to be supplied to that function. I have had collisions in some instances. I'm looking for ways to improve this so its more useful and the hashes are consistent but not likely to collide.The third argument can be ignored as its simply a way to force the cache to be bypassed.

Examples of how this function is called:

$data = $db->cachedCall('getUserByEmail',[$this->email],true);

$data = $db->cachedCall('getCell',['SELECT id FROM foobar WHERE foo=:bar',[':bar'=>55]]);

If possible, i would like to guarantee the keys have a consistent length at the same time.

Community
  • 1
  • 1
r3wt
  • 4,642
  • 2
  • 33
  • 55
  • What's wrong with the key scheme I gave... `$key = md5($caller . md5($call) . md5(serialize($args)));`. It's using the args and the point of origin of the call. Look at how "$caller" is generated, print it out and decide if this is what you want. – Harry Apr 06 '16 at 16:27
  • caller is irrelevant. bottom line, if same args are passed to function, then the same database call is being made and should return the same record. your answer adds nothing useful, and has the side effect of duplicate records. what im trying to do is reduce the probability of collisions while maintaining deterministic keys.. – r3wt Apr 06 '16 at 17:00
  • I updated my post. I'm just trying to help. – Harry Apr 06 '16 at 18:48
  • What is `$this`? Is it a singleton, or are there multiple instances of the object in your application? –  Apr 06 '16 at 20:28
  • it is a singleton, but even if there were multiple instances of the object, it wouldn't matter. – r3wt Apr 06 '16 at 20:55

2 Answers2

1

This is because the key could be the same in different instances, for example when calling to the method cachedCall have the same arguments. As I image you should share the same memcached server for each instance, and then that is the reason why you have cache collisions.

Demostration

As I read, the variable $call will have a limited values shared with any of the other parts of the code, because will contain a name of a method of the class that contains the method cachedCall, that means it is very easy that two different calls shares this value.

Furthermore, you can call to this method with an empty array of arguments.

So, is very easy to have the same method call in two different instances:

cachedCall('methodX', array()); <- From instance A
cachedCall('methodX', array()); <- From instance B

This will store this content in the same memcached key

Solution

Inside the method, take in account in someway the instance name. For example, you could use the current url as part of the key, or the domain name (depending on your case):

$key = md5($call) . md5(serialize($args)) . md5($_SERVER['HTTP_HOST']);
$key = md5($call) . md5(serialize($args)) . md5($_SERVER['REQUEST_URI']);

Above you can see two examples of how you can change the memcached key depending on your instance.

Miguel
  • 1,361
  • 1
  • 13
  • 24
  • I would include the class name as well as the method in this solution as two classes could have the same method. `$key = md5(__CLASS__ . $call) . md5(serialize($args)) . md5($_SERVER['REQUEST_URI']);` – Steve E. Mar 30 '16 at 15:51
  • if the function is called with the same arguments it should generate the same key. `$call` is the name of the method being called on the database class, and `$args` are the arguments to be supplied when calling that function the goal is to have a consistent key generated for the combination of `$call` + `$args` with less chance of collision than with md5/serialize combo. it appears there is some chance that md5s could collide. your idea is no solution, as it results in needless extra copies of the dataset – r3wt Mar 30 '16 at 18:11
  • Your problem is not the MD5 collision, I suggest you to try with other hash algorithm or without it to check the collision still happens. Your problem is you are calling to the cachedCall with the same arguments in different instances, and then the collision happens, could you try it? Then we will go out of doubts :) – Miguel Mar 31 '16 at 12:06
  • if a call happens to cacheCall with the same arguments, then its the same damn call genius, and it should return the same damn result... sheesh – r3wt Apr 06 '16 at 15:25
  • 1
    Depends if the instances uses the same database you are right, but that is a point you don't specify. If each instance use different database, will be the same method call, but does not means the result should be the same. For example, if I have two wordpress instances which uses different databases, and shares the memcache server, I calling that method with the same arguments should return different values. And please, be more polite because I am trying to help you, your technical problem it is not mine. Thanks – Miguel Apr 07 '16 at 09:45
1

If your arguments are guaranteed to be unique per query and you're getting collisions then I think there may be a bug in your code.

The likelihood of collisions using MD5 is remote...

How many random elements before MD5 produces collisions?

If you're seeing collisions something's wrong. PHP serializing an array will serialize it ordered so md5(serialize($array_here) should be safe. I had a problem where I didn't box the argument from the calling function when trying to pass a single array. If your args are in an array before calling then you have no issue.

Community
  • 1
  • 1
Harry
  • 11,298
  • 1
  • 29
  • 43
  • how would it break if you passed in your argument(s) to `cachedCall` correctly ie, in boxed format `$db->cachedCall('someMethod',[$args])` the calling scope is responsible for calling the method correctly, the cachedCall method has no responsibility to safety guard the code. i don't understand at all what point you're trying to make here. – r3wt Apr 06 '16 at 18:54