What defines a valid object state?

Question

I am reading an article about constructors doing too much work. One paragraph reads

In the object oriented style, where dependencies tend to be inverted, the constructor has a different and more Spartan role. Its only job is to make sure that object initializes into a state where it satisfies its basic invariants (in other words, it makes sure that the object instance starts off in a valid state and nothing more).

Here is a basic example of a class. On creation of the class I pass in the HTML which needs parsed to then set the class properties.

OrderHtmlParser
{
    protected $html;

    protected $orderNumber;

    public function __construct($html)
    {
        $this->html = $html;
    }

    public function parse()
    {
        $complexLogicResult = $this->doComplexLogic($this->html);

        $this->orderNumber = $complexLogicResult;
    }

    public function getOrderNumber()
    {
        return $this->orderNumber;
    }

    protected function doComplexLogic($html)
    {
        // ...

        return $complexLogicResult;
    }
}

I'm calling it using

$orderparser = new OrderHtmlParser($html);
$orderparser->parse()
$orderparser->getOrderNumber();

I use a parse function because I dont want the constructor to be doing any logic as both the article above and this article state this is terrible practice.

public function __construct($html)
{
    $this->html = $html;
    $this->parse(); // bad
}

However if I don't use the parse method, then all my properties (in this example just the one) would return null.

Is this known as an object in an 'invalid state'?

Also, it somewhat feels like my parse method is an initialise function in disguise, which is also deemed bad by the other article (although im not sure if that is only when a constructor calls that method, when it is manually called or both). Regardless, the initialise method is still doing some complex logic prior to setting a property - which needs to happen before the getters can be reliably called.

So either I am misunderstanding these articles or these articles are pushing me to think that maybe my overall implementation of this simple class is incorrect.

Consider not passing `$html` to the constructor, but to the `parse` method. Then the behaviour of the other methods (returning null) makes more sense. You *could* still allow the argument to be passed to the constructor also (optionally), but then the constructor should call the object's *parse* method. Apparently that optional behaviour does not fit the vision expressed in the article, but it looks acceptable to me. — trincot, Apr 05 '17 at 16:36
@myol - I have updated my reply, added considerations on constructors in general and on exceptions in the constructor. — Maxim Masiutin, May 12 '17 at 09:24

score 8 · Accepted Answer · answered Apr 05 '17 at 16:57

Generally, it's a code smell to perform work in a constructor, but the reason behind the practice has more to do with the programming language than an opinion about best practices. There are real edge cases that will introduce bugs.

In some languages derived classes have their constructors executed from the bottom up, and in other languages from the top down. In PHP they are called from top to bottom and you can even stop the chain by not calling parent::__construct().

This creates unknown state expectations in base classes, and to make matters worse PHP allows you to either call parent first or last in a constructor.

For Example;

class A extends B {
     public __construct() {
           $this->foo = "I am changing the state here";
           parent::__construct(); // call parent last
     }
}

class A extends B {
     public __construct() {
           parent::__construct(); // call parent first
           $this->foo = "I am changing the state here";
     }
}

In the above example class B has it's constructor called in different orders and if B was doing a lot of work in the constructor, then it might not be in the state the programmer was expecting.

So how do you solve your problem?

You need two classes here. One will contain the parser logic and the other the parser results.

class OrderHtmlResult {
      private $number;
      public __construct($number) {
            $this->number = $number;
      }
      public getOrderNumber() {
            return $this->number;
      }
}

class OrderHtmlParser {
      public parse($html) {
          $complexLogicResult = $this->doComplexLogic($this->html);
          return new OrderHtmlResult($complexLogicResult);
      }
}

$orderparser = new OrderHtmlParser($html);
$order = $orderparser->parse($html)
echo $order->getOrderNumber();

In the above example you could have the parse() method return null if it fails to extract the order number, or throw an example. But neither class ever enters into an invalid state.

There is a name for this pattern, where a method yields another object as the result in order to encapsulate state information, but I remember what it's called.

I think this pattern is called _procedural programming_. Separating logic from data is the antithesis of OOP, where the main objective is to _combine_ logic with data. — jaco0646, Apr 05 '17 at 19:31
Thank you, I ended up taking this approach so that my formatters and validators could expect parsed objects using interfaces — myol, May 15 '17 at 09:09

score 5 · Answer 2 · answered Apr 05 '17 at 20:01

Is this known as an object in an 'invalid state'?

Yes. You're exactly correct that the parse method is an initialise function in disguise.

To avoid the initialization parsing, be lazy. The laziest approach is to eliminate the $orderNumber field and parse it from the $html inside of the getOrderNumber() function. If you expect that function to be called repeatedly and/or you expect the parsing to be expensive, then keep the $orderNumber field but treat is as a cache. Check it for null inside of getOrderNumber() and parse it out only on the first invocation.

Regarding the linked articles, I agree in principle that constructors should be limited to field initialization; however, if those fields are parsed from a block of text and the expectation is that clients will utilize most or all of the parsed values, then lazy initialization has little value. Furthermore, when the text parsing does not involve IO or newing up domain objects, it should not impede blackbox testing, for which eager vs. lazy initialization is invisible.

Bill K · Answer 3 · 2017-04-05T20:45:54.493

One of the common problems that occurs when the constructor does "too much" is that two objects that are somewhat closely linked need to reference each other (Yes, the close linking is a bad smell, but it happens).

If Object A and Object B must reference each other in order to be "Valid", then how do you create either?

The answer is usually that your constructor makes objects that are not fully "Valid", you add the reference to the other invalid object and then you call some kind of finalize/initialize/start method to finish and make your object valid.

If you still want to be "Safe", you can protect your business methods by throwing a not initialized exception if it's called before the object is "Valid".

Dependency Injection has a generalized version of this problem, what if you had a circular loop of injected classes? Following the construct/initialize pattern solves the general case too, so DI just always uses that pattern.

Steve Ellinger · Answer 4 · 2017-05-10T11:01:35.977

Addressing the question in the title I have always viewed an object as being in a valid state when it can perform its work without any issues; that is to say, it works as expected.

In looking at the linked article what jumped out at me was that the constructor logic was creating a lot of objects: I counted 7. All of these objects were tightly coupled with the class in question (ActiveProduct) as they were referenced directly and the constructor was passing the this pointer to the other objects constructors:

    VirtualCalculator = new ProgramCalculator(this, true);
    DFS = new DFSCalibration(this);

In this case ActiveProduct has not yet completed its initialization yet ProgramCalculator and DFSCalibration can call back into ActiveProduct via methods and properties and cause all sorts of mischief so for this reason the code is highly suspect. In general in OOP you want to be passing objects to the constructor not instantiating them in the constructor. Also you want to employ the Dependency Inversion Principle and use interfaces or abstract / pure virtual classes when passing objects to constructors which would allow for dependency injection.

In the case of your class OrderHtmlParser this doesn't seem to be an issue as the complex logic at issue doesn't appear to go outside of the OrderHtmlParser class. I was curious as to why the doComplexLogic function was defined as protected, implying that inheriting classes can call it.

That said how to deal with initialization may be as simple as making the Parse method static and using it to construct the instance of the OrderHtmlParser class and make the constructor private so that the caller has to call the Parse method to get an instance:

OrderHtmlParser
{
    protected $html;

    protected $orderNumber;

    private function __construct()
    {

    }

    public static function parse($html)
    {
        $instance = new OrderHtmlParser();

        $instance->html = $html;

        $complexLogicResult = $instance->doComplexLogic($this->html);

        $instance->orderNumber = $complexLogicResult;

        return $instance;
    }

    public function getOrderNumber()
    {
        return $this->orderNumber;
    }

    protected function doComplexLogic($html)
    {
        // ...

        return $complexLogicResult;
    }
}

score 2 · Answer 5 · answered May 08 '17 at 14:24

I fully agree with the comment from @trincot:

When you create the Parser with the Constructor, there is no need to pass the html.

Maybe you want to use the Parser Object a second Time with another Input.

So to have a clean constructor, I use a reset() Function, which is also called in the Beginning and which resets the initial State of the Object.

Example:

class OrderHtmlParser
{
  protected $html;
  protected $orderNumber;

  public function __construct()
  {
    $this->reset();
  }

  public function reset()
  {
    $this->html = null;
    $this->orderNumber = null;
  }
  /**
   * Parse the given Context and return the result
   */ 
  public function parse($html)
  {
    // Store the Input for whatever
    $this->html = $html;
    // Parse
    $complexLogicResult = $this->doComplexLogic($this->html);
    // Store the Result for whatever
    $this->orderNumber = $complexLogicResult;
    // return the Result
    return $this->orderNumber;
  }

  public function getOrderNumber(){}
  protected function doComplexLogic($html){}
}

Like this, the Parsing Object can do, what it is supposed to do:

Parse as often as you want:

$parser = new OrderHtmlParser();
$result1 = $parser->parse($html1);
$parser->reset();
$result2 = $parser->parse($html2);

Thank you, the idea of reset functionality seems so obvious now but is something that I completely overlooked. I will definitely consider this going forward — myol, May 15 '17 at 09:16

Maxim Masiutin · Answer 6 · 2021-08-27T16:38:09.687

Thank you for the excellent question!

This is an error-prone design to pass extensive data to the constructor that will merely store it inside the object to process this large data later.

Let me quote your beautiful quote again (the bold mark is mine):

In the object oriented style, where dependencies tend to be inverted, the constructor has a different and more Spartan role. Its only job is to make sure that object initializes into a state where it satisfies its basic invariants (in other words, it makes sure that the object instance starts off in a valid state and nothing more).

The design of the parser class in your example is troublesome because the constructor takes the input data, which is real data to process, not just “initialization data” as mentioned in the quote below, but does not actually process the data.

According to professor Ira Pohl from the University of California, Santa Cruz, the main roles of constructors are: (1) initialize the object, (2) convert values when a class has different constructor each having a different list of arguments - this concept is known as constructor overloading, (3) check for correctness - when the constructor parameters are checked to belong to a legal range.

Anyway, even if constructors are meant to initialize, convert and check, it happens very quickly, without significant delays.

In the very old programming courses, in 1980-s, we were told that a program has input and output.

Think of the $html as of the program input.

The constructors are not supposed to accept the program input. They are only supposed to accept the configuration, initialization data like character set name or other configuration parameters that may not be provided later. If they accept large data, they will likely be needed to sometimes throw exceptions, and exceptions in a constructor is a very bad style. Exceptions in constructors should better be avoided to make the code easier to fathom. For example, you may pass a file name to the constructor, but you should not open files in the constructor, and so on.

Let me modify your class a little bit.

enum ParserState (undefined, ready, result_available, error);


OrderHtmlParser
{

    protected $orderNumber;
    protected $defaultEncoding;
    protected ParserState $state;

    public function __construct($orderNumber, $defaultEncoding default "utf-8")
    {
        $this->orderNumber = $orderNumber;
        $this->defaultEncoding = $defaultEncoding;
        $this->state = ParserState::ready;
    }

    public function feed_data($data)
    {
        if ($this->state != ParserState::ready) raise Exception("You can only feed the data to the parser when it is ready");

        // accumulate the data and parse it until we get enough information to make the result available

        if we have enough result, let $state = ParserState::resultAvailable;
    }

    public function ParserState getState()
    {
        return $this->state
    }

    public function getOrderNumber()
    {
        return $this->orderNumber;
    }

    protected function getResult($html)
    {
        if ($this->state != ParserState::resultAvailable) raise Exception("You should wait until the result is available");

        // accumulate the data and parse it until we get enough information to make the result available
    }
}

If you conceive the class to have an evident design, programmers who use your class will not forget to call any method. The design in your original question was flawed because, contrary to the logic, the constructor did take the data but didn’t do anything with it, and a particular function was needed that was not obvious. If you make the design simple and obvious, you will not need even states. The states are only needed for classes that accumulate data for a long time until the result is ready, like state machines, for example, asynchronous reading of the HTML from the TCP/IP socket to feed this data to a parser.

$orderparser = new OrderHtmlParser($orderNumber, "Windows-1251");
repeat
  $data = getMoreDataFromSocket();
  $orderparser->feed_data($data);
until $orderparser->getState()==ParserState::resultAvailable;
$orderparser->getResult();

As for your initial questions about object states: if you design a class in such a way that the constructor only gets initialization data while there are methods that receive and process the data, so there are no separate functions to store the data and parse the data that may be forgotten to call – no states are needed. If you still need states for long-living objects that collect or supply the data sequentially, you may use the enumeration type like in the example above. My example is in abstract language, not a particular programming language.

Thank you, your answer really explained the -why- of your approach by explaining input and output. You have given me a completely (albeit subtly) different way of thinking about object interactions going forward. I was busy over the weekend and SO auto assigned the bounty, so I have added another to give to you due to this answer. — myol, May 15 '17 at 09:14
"_Exceptions in constructors should be avoided at all cost._" Why? — jaco0646, Feb 09 '20 at 23:24
@jaco0646 I’m an advocate of simple constructors. I prefer all complex tasks are moved to out of constructors. This makes program design leaner and the program becomes easier to comprehend and to manage for long-term products that last and are being developed for decades. — Maxim Masiutin, Feb 10 '20 at 14:51
@jaco0646 As a consequence of complex constructors (that routinely throw exceptions), you have to prepare for the cases where an exception thrown in a constructor prevents the destructor from calling, and thus you should write twice the code that frees stuff allocated from a constructor: first in a “try/catch” block of the constructor (that will run in case of an exception) and second in a destructor. Of course, you can make a function and call it from both “catch” of the constructor and from the destructor, but this leads to less leaner design. — Maxim Masiutin, Feb 10 '20 at 14:51
Memory management is a fair point, though this answer may be limited to PHP. For example, it is a minor issue in [Java](https://stackoverflow.com/questions/6086334/is-it-good-practice-to-make-the-constructor-throw-an-exception) and though it is discussed in [C++](https://stackoverflow.com/questions/810839/throwing-exceptions-from-constructors), the consensus seems to be there is nothing wrong with it. — jaco0646, Feb 10 '20 at 15:06
@jaco0646 by freeing the stuff I meant mostly resources like files, threads, semaphores, and other things that cannot be routinely freed like memory. — Maxim Masiutin, Feb 10 '20 at 17:05

What defines a valid object state?

6 Answers6