One useful thing to realize is that your model code tends to be self contained -- it knows about data elements in the model (ie, the data graph), and the data consistency rules, but not anything else.
So your model for a page would probably look like
class Page {
URL uri;
ImageCollection images;
}
In other words, the model knows about the relationship between pages and images, but it does not necessarily know what those things mean in practice.
To actually compare your domain model with the real world, you pass to the model some service that knows how to do the work, but does not know the state.
class Crawler {
void verify(URL page, ImageCollection images)
}
Now you match them together; you construct the Crawler, and pass it to the Page. The page finds its state, and passes that state to the crawler
class Page {
void verifyWith(Crawler crawler) {
crawler.verify(this.uri, this.items);
}
}
Of course, you probably don't want to couple the page too closely to the Crawler; after all, you might want to swap out the crawler libraries, you might want to do something else with the page state.
So you make the signature of this method more general; it accepts an interface, rather than an object with a specific meaning. In the classic book Design Patterns, this would be an example of the Visitor Pattern
class Page {
interface Visitor {
void visitPage(URL uri, ImageCollection images);
}
void verifyWith(Visitor visitor) {
visitor.visitPage(this.uri, this.images);
}
}
class Crawler implements Page.Visitor {
void visitPage(URL page, ImageCollection images) {
....
}
}
Note -- the model (page) is responsible for maintaining the integrity of its data. That means that any data it passes to a visitor should be immutable, or failing that a mutable copy of the state of the model.
In the long term, you probably wouldn't want the definition of the Visitor embedded in the Page like this. Page is part of the model's API, but the Visitor is part of the model's SPI.
interface PageVisitor {
void visitPage(URL uri, ImageCollection images);
}
class Page {
void verifyWith(PageVisitor visitor) {
visitor.visitPage(this.uri, this.images);
}
}
class Crawler implements PageVisitor {
void visitPage(URL page, ImageCollection images) {
....
}
}
One thing that did get glossed over here is that you seem to have two different implementations of "page"
// Here's one?
$page = new Page($url);
// And here is something else?
$pageReturned = $crawler->get($page);
One of the lessons of ddd is the naming of things; in particular, making sure that you don't combine two ideas that really have separate meanings. In this case, you should be clear on what type is returned by the crawler.
For example, if you were in a domain where the ubiquitous language borrowed from REST, then you might have statements that look like
$representation = $crawler->get($resource);
In your example, the language looks more HTML specific, so this might be reasonable
$htmlDocument = $crawler->get($page)
The reason for exposing this: the document/representation fits well with the notion of being a value object -- it's an immutable bag of immutable stuff; you can't change the "page" by manipulating the html document in any way.
Value objects are purely query surfaces -- any method on them that looks like a mutation is really a query that returns a new instance of the type.
Value objects are a great fit for the specification pattern described by plalx in his answer:
HtmlSpecification {
boolean isSatisfiedBy(HtmlDocument);
}