9

So for the past few days I have been tearing my hair out trying to get a class to clone properly. The problem is that cloning doesn't remove/redo any of the pass-by-reference. The result is, that the main data object is still passed as a reference, thus completely negating the effect of the clone.

Here's a simplified version of the problem:

class my_class {

    private $data;

    public $var1;
    public $var2;
    public $var3;


    public function __construct() {
        $this->data = new stdClass;

        $this->data->var1 = 'a';
        $this->data->var2 = 'b';
        $this->data->var3 = 'c';

        $this->var1 = &$this->data->var1;
        $this->var2 = &$this->data->var2;
        $this->var3 = &$this->data->var3;
    }
}


$original  = new my_class;
$new       = clone $original;
$new->var3 = 'd';

// Output Should Be "c", outputs "d"
echo $original->var3;

See it in action here: http://3v4l.org/nm6NW

My Question: How can I use __clone() to correct the output from "d" to "c"?

Please help in any way you can!?


Update

I ended up making __clone() trigger an error and creating a function called make_clone() to standardize cloning.

__clone() now looks like:

public function __clone() {
    $trace = debug_backtrace();
    $fmt = 'Invalid clone in <b>%4$s</b> on line <b>%5$s</b>. Because of how cloning works, and how references are configured within the class, extensions of %1$s cannot be cloned. Please use <code>make_clone()</code> instead. Error triggered';
    trigger_error(sprintf($fmt, $trace[0]['class'], $trace[0]['function'], 'clone', $trace[0]['file'], $trace[0]['line']), E_USER_NOTICE);
}

and make_clone() looks like:

public function make_clone() {
    // This line recreates the current instance at its' current state.
    $clone = new $this(generate::string($this->object));
    // In my class $this->input is a type of history state.
    // The history of both the original and the clone should match
    $clone->input = $this->input;
    return $clone;
}
Nicholas Summers
  • 4,444
  • 4
  • 19
  • 35
  • 1
    The standard `clone` method does a shallow copy. It sounds like you need a deep copy, so you need your clone method to clone `$this->object`. – Barmar Aug 21 '14 at 07:52
  • Tried it, still no good – Nicholas Summers Aug 21 '14 at 07:53
  • 1
    I'm not sure what this has to do with pass-by-reference. That only applies when you're calling a function. – Barmar Aug 21 '14 at 08:18
  • Can you make a simplified test case that you can post here. Trying to figure out what you're doing from hundreds of lines on github is difficult. Also, the links won't be valid once the problem is resolved and you updated your github. – Barmar Aug 21 '14 at 08:19
  • @barmar As for as pass-by-reference, `$a = &$b` is not the same as `$a = $b'. Passing by reference is not limited to functions/methods. – Nicholas Summers Aug 21 '14 at 08:23
  • @Barmar As far as providing a test case.... You should at least read the whole posting before asking for something, the person may have already provided it. – Nicholas Summers Aug 21 '14 at 08:25
  • Maybe you'd find a solution, but you're solving this problem in wrong way. First of all, class properties should never be `public` (until its a tree), because that breaks `encapsulation`. Second right here - you should be using Strategy Pattern to solve that. – Yang Aug 21 '14 at 08:30
  • I read the whole post. There's no relevant code in it, just a bunch of github links. Which I followed, and couldn't make sense of because there's too much code. Do you need all 1350 lines to reproduce this problem? – Barmar Aug 21 '14 at 08:31
  • @Barmar So you read the phrase "Here is my test script:..." and thought, "this is not a way to test his code" ??? I am confused by your logic – Nicholas Summers Aug 21 '14 at 08:40
  • Your problem has to do with the way your class is defined. THAT is the code we need to see and fix. – Barmar Aug 21 '14 at 08:41
  • @Barmar, this is the most simple version of the problem I can come up with. http://3v4l.org/nm6NW - now all you have to do is use `__clone()` to change the output from "d" to "c" – Nicholas Summers Aug 21 '14 at 09:12
  • I tried writing `__clone` that clones `$this->data` and also reassigns the `$this->varN` references, but it didn't fix it. Very strange. – Barmar Aug 21 '14 at 09:34
  • @Barmar Welcome to the source of my headaches for the past few days. – Nicholas Summers Aug 21 '14 at 09:36
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/59712/discussion-between-nick-j-and-barmar). – Nicholas Summers Aug 21 '14 at 09:37
  • Instead of using references for `$varN`, could you use magic `__get` and `__set` methods that access `$this->data->$var`? Then just cloning `$this->object` in the `__clone` method works. – Barmar Aug 21 '14 at 09:43
  • Is that a theory, or something you have tried? I only ask because `__get()` and `__set()` in the original script already work like that. – Nicholas Summers Aug 21 '14 at 09:47

1 Answers1

13

TL;DR

This is a classic case of PHP SNAFU. I will explain how and why it happens, but unfortunately as far as I can tell there is no possible solution that is satisfactory.

A brittle solution exists if you can run code before PHP shallow clones the object (e.g. by writing your own cloning method), but not if the code runs afterwards, which is how __clone works. However, this solution can still fail for other reasons outside your control.

There is also another option that is safe which involves a well-known "cloning" trick, but it also has drawbacks: it only works on data that is serializable, and it doesn't allow you to keep any references inside that data around even if you want to.

At the end of the day if you want to keep your sanity you will have to move away from implementing the properties $this->varN as references.

The plight of the poor PHP developer

Normally you would have to deep clone everything that needs to be cloned inside __clone. Then you would also have to reassign any references that still point to the instances that were just deep cloned.

You would think these two steps should be enough, for example:

public function __construct()
{
    $this->data = new stdClass;
    $this->data->var1 = 'a';
    $this->data->var2 = 'b';
    $this->data->var3 = 'c';
    $this->assignReferences();
}

public function __clone()
{
    $this->data = clone $this->data;
    $this->assignReferences();
}

private function assignReferences()
{
    $this->var1 = &$this->data->var1;
    $this->var2 = &$this->data->var2;
    $this->var3 = &$this->data->var3;        
}

However, this does not work. How can that be?

Zend engine references

If you var_dump($this->data) before and after assignReferences() in the constructor you will see that assigning those references causes the contents of $this->data to become references themselves.

This is an artifact of how references are internally implemented in PHP and there is nothing you can do about it directly. What you can do is convert them back to normal values first by losing all other references to them, after which cloning as above would work.

In code:

public function __construct()
{
    $this->data = new stdClass;
    $this->data->var1 = 'a';
    $this->data->var2 = 'b';
    $this->data->var3 = 'c';
    $this->assignReferences();
}

public function makeClone()
{
    unset($this->var1);  // turns $this->data->var1 into non-reference
    unset($this->var2);  // turns $this->data->var2 into non-reference
    unset($this->var3);  // turns $this->data->var3 into non-reference

    $clone = clone $this;               // this code is the same
    $clone->data = clone $clone->data;  // as what would go into
    $clone->assignReferences();         // __clone() normally

    $this->assignReferences(); // undo the unset()s
    return $clone;
}

private function assignReferences()
{
    $this->var1 = &$this->data->var1;
    $this->var2 = &$this->data->var2;
    $this->var3 = &$this->data->var3;        
}

This appears to work, but it's immediately not very satisfactory because you have to know that the way to clone this object is $obj->makeClone() instead of clone $obj -- the natural approach will fail.

However, there is also a more insidious bug here waiting to bite you: to un-reference the values inside $this->data you have to lose all references to them in the program. The code above does so for the references in $this->varN, but what about references other code might have created?

Compare this:

$original = new my_class;
$new = $original->makeClone();
$new->var3 = 'd'; 

echo $original->var3; // works, "c"

To this:

$original = new my_class;
$oops = &$original->var3; // did you think this might be a problem?
$new = $original->makeClone();
$new->var3 = 'd'; 

echo $original->var3; // doesn't work!

We are now back to square one. And worse, there is no way to prevent someone from doing this and messing up your program.

Kill the references with fire

There is a guaranteed way to make the references inside $this->data go away no matter what: serialization.

public function __construct()
{
    $this->data = new stdClass;
    $this->data->var1 = 'a';
    $this->data->var2 = 'b';
    $this->data->var3 = 'c';
    $this->assignReferences();
}

public function __clone()
{
    $this->data = unserialize(serialize($this->data)); // instead of clone
    $this->assignReferences();
}

This works with the values in question, but it also has drawbacks:

  1. You cannot have any values (recursively) inside $this->data that are not serializable.
  2. It will indiscriminately kill all references inside $this->data -- even those you might want to preserve on purpose.
  3. It's less performant (a theoretical point, to be fair).

So what to do?

After the obligatory bashing of PHP, follow the classic doctor's advice: if it hurts when you do something, then don't do it.

In this case this means that you just can't expose the contents of $this->data through public properties (references) on the object. Instead of this use getter functions or possibly implement the magic __get.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • Just a slight recommendation: In this situation, it would be better to use json_encode/json_decode because it is [about twice as fast](http://stackoverflow.com/questions/804045/preferred-method-to-store-php-arrays-json-encode-vs-serialize) and would achieve the same goal. [Like this](http://ideone.com/HI66hY) – Nicholas Summers Aug 22 '14 at 14:23
  • @NickJ: IMO the speed matters a lot less than other problems that might arise, e.g. `json_encode` will barf on strings that are not valid UTF-8, which includes pretty much all binary data. For example if `$this->var1 = "\xff"` then the cloning will fail. Why take the chance? – Jon Aug 22 '14 at 14:28
  • Considering the real class's purpose and the range of its' possible inputs, that could be an important factor. However, in the near future the class will support standardizing the input/output strings to a specific format (probably UTF-8). I would assume that PHP's translation of strings (like you mentioned above) would overcome this issue??? – Nicholas Summers Aug 25 '14 at 00:27
  • @NickJ: I'm not sure I understand correctly, are you asking if standardizing string content to UTF-8 would fix the problem? It would fix this one, but other possibilities would remain (e.g. round-trip an array and it can easily happen that it comes back as an `stdClass` object, which would be a problem). For example the array `[1 => 'foo']` by default comes back as an object. – Jon Aug 25 '14 at 09:43
  • `unserialize(serialize(...))` work for me. Thank you! – vee May 20 '19 at 22:13