Clean Architecture: Use different model classes for different data sources?

Question

I am currently developing a news feed android app. I try to design my app according to the principles of clean architecture.

In the data layer I am using the repository pattern as a facade for the diferent data sources: remote data from an API (https://newsapi.org/) , local data from an DB (Realm or SQLite) as well as some in-memory cache.
In my domain layer I have defined some immutable model classes (Article, NewsSource, etc.) which are being used by the domain layer as well as the presentation layer (no need for extra model classes in the presentation layer in my opinion).

Does it make sense to use different model classes for the remote data source as well as for the local data source?

E.g. The remote data source uses Retrofit to make API calls and the models need to be annotated in order to be parsed by GSON.

data class RemoteArticleModel(
        @SerializedName("title") val title: String,
        @SerializedName("urlToImage") val urlToImage: String,
        @SerializedName("url") val url: String)

The models for the local data source also may have to fulfill some certain contract like models in a Realm DB need to extend RealmObject.

open class Dog : RealmObject() {
    var name: String? = null
    @LinkingObjects("dog")
    val owners: RealmResults<Person>? = null
}

Obviously, I don´t want my domain models to be 'polluted' by any data source specific contract (annotations, RealmObject inheritance, etc.). So I thought it would make sense to use different models for different data sources and the repository handles the mapping between them.

E.g. We want to fetch all articles from the remote API, store them in the local DB and return them to the domain layer.

Flow would be like: Remote data source makes http request to news api and retrieves a list of RemoteArticleModel´s. The repository would map these models to a Domain specific article model (Article). Then these would be mapped to DB models (e.g. RealmArticleModel) and inserted into the DB. Finally the list of Article´s would be returned to the caller.

Two questions arise: The above example shows how many allocations there would be using this approach. For every article that is going to be downloaded and inserted into the DB, three models would be created in that process. Would that be overkill?

Also, I know that the data layer should use different model classes than the domain layer (inner layer should no nothing about outer layer). But how would that make sense in the above example. I would already have two different model classes for the two different data sources. Adding a third one that´s being used as a 'mediator' model by the data-layer/repository to handle mapping to other models (remote, local, domain) would add even more allocations.

So should the data layer know nothing about domain models and let the domain do the mapping from a data layer model to a domain layer model?

Should there be a generic model used only by the repository/data-layer?

Thank, I really appreciate any help from more experienced developers :)

is this what you are talking to about? https://github.com/sahilNaran/layeredMvp? — archLucifer, Aug 24 '17 at 13:16
Yeah something similar to that, thanks a lot. This example exposes the domain model to the data layer. So is that a valid thing to do? And also why does this project use different modules for data, domain, etc? Is this to even further decouple layers? — Elias, Aug 24 '17 at 13:22
it is valid because the data layer is not leaking out. Yes for further decoupling if you look at the gradle files, however it is not necessary to go that extreme. It also helps with scoping, so that I don't accidentally use the wrong class (I know it can be done with namespaces) but does way blocks access — archLucifer, Aug 24 '17 at 13:24
What do you mean by leaking out? Also, can these allocations be neglected with regard to performance and garbage collection? — Elias, Aug 24 '17 at 13:27
By leaking out I mean that the api layer wont have access to any objects in the data layer. The api layer will ask the data layer for information and it will expect a domain object not a data layer object (https://drive.google.com/file/d/0B6BpQwSa9zXwdlU5dlFEajhoZ2c/view?usp=sharing). I am not sure about that. — archLucifer, Aug 24 '17 at 13:32
Well thanks, that makes things clear now. I guess the flexibility that using different models gives me makes up for the allocations. After all, if there would be a bigger performance impact I could re-think my strategy but for now I´ll go with the approach you´ve suggested and I partly already use. So thanks a lot :) — Elias, Aug 24 '17 at 13:38

score 20 · Accepted Answer · answered Aug 24 '17 at 13:41

20

The overriding principle you should follow is separation of concerns.

The persistence layer should have classes that only deal with the storing and retrieval of data, in this case the Realm classes.

The network layer should have classes that deal with the data from the server, in this case the Retrofit classes.

Moving data from any of those layers to your business layers requires you to map the persistence and network objects to your domain.

To answer your first question, the mapping insulates goes around the different concerns, separating the domain from the data layers. The data layer should not know the domain models. The domain requests data from the data layer, the data layer gets the data and passes it through a mapper and thus returns the domain model.

To answer your second question, it would be a violation of the separation of concerns to have a generic model for your data layers if you get the data from different sources. The persistence models and the network models represent different parts of the system, and therefore should be represented by different models. The domain does not need to know this, so and data requested should therefore be mapped to domain objects before crossing the boundary back to the domain.

answered Aug 24 '17 at 13:41

Brian

573
4
8

Your explanation makes total sense, thanks for that. I have one question though: – Elias Aug 24 '17 at 14:07
1

_The data layer should not know the domain models. The domain requests data from the data layer, the data layer gets the data and passes it through a mapper and thus returns the domain model._ So I would have a mapper for each data source specific model (DB, Remote, etc.) which converts to a domain model am I right? And the repository facade should be responsible for mapping these models? Or should the single data sources be responsible for mapping the models directly and the repository will only ever work with domain models? – Elias Aug 24 '17 at 14:08
3

The cleanest way to do it would be to have a separate mapper for each object you want to transport through the border. This way you achieve single responsibility, ie, the mapper will be responsible for for converting the data object to a domain object. and the data class only worries about the data it represents – Brian Aug 24 '17 at 18:00
1

@Brian do you have an example of this mapping before crossing a layer boundary? – Etienne Lawlor Dec 30 '17 at 08:01

score 2 · Answer 2 · answered Aug 05 '19 at 19:20

Adding to @Brian answer, probably you can add or encapsulate the Data layer as in the Clean Boilerplate suggests:

This way you have a common Data model which is mapped to the domain model. I'm not really sure if this adds unnecessary code and layers, because then the data and domain models will probably look pretty much the same.

Clean Architecture: Use different model classes for different data sources?

2 Answers2