16

When you are developing, when can you determine if you have a lot of unnecessary classes in your application? Is there a certain limit on how many classes you should have?

kingsfoil
  • 3,795
  • 7
  • 32
  • 56
user962206
  • 15,637
  • 61
  • 177
  • 270
  • 7
    DEFINITELY YES - that is probably the single, most difficult part of any OOP project : defining the minimal set of classes to get your job done. It is very easy to over-design/complicate a project with unnecessary projected future-degrees-of-freedom (aka unneeded/unused flexibility). – kfmfe04 Jan 22 '14 at 15:39

9 Answers9

28

There really isn't such a thing as "too many classes." What can be a problem is "too many classes doing the same thing."

If you feel that you have too many classes in your codebase, a good way to audit that would be to add some new requirements. Anything that forces you to make some changes to the code. (In a separate branch of the source control, of course.) How difficult is it to make those changes? Does a relatively simple change require that you modify tons and tons of classes? If that's the case then there's a very good chance that you do have too many, but the problem isn't the number itself.

It's primarily a matter of personal preference in many cases. There's often a trade-off between code re-use and code de-coupling. By separating out every concern possible and having lots of small classes, you de-couple everything from everything else. However, you often find that you have to repeat code in such cases because a lot of code might be doing "the same thing" but for a slightly different reason.

On the other hand, if you insist on never repeating anything in the code, then while you get fewer classes you also often end up with more coupling because a single class will have multiple responsibilities to any code which requires similar functionality.

The bottom line in most cases is resistance to change. Coupling vs. re-use is something people can argue about at length, but software rigidity is where the argument turns into actual effort (money). Test how difficult it is to make changes to the code. Then try re-arranging your classes/logic in a manner that you think would be more accepting of change and test it again. Was there a significant improvement?

David
  • 208,112
  • 36
  • 198
  • 279
  • I believe a major argument you're making is for unrolling. This is where you unroll classes, expose a lot of specifics rather than doing things in bulk. The idea being that if one in a thousand needs something different it seems easy to change its boiler plate method without effecting anything else. The problem is you can often achieve this much more easily with a simple if statement. There are also ways in OOP to define a leaf and the branch to it back to your trunk. You rarely need to unfurl the entire tree from the trunk outward to achieve that. – jgmjgm Jan 24 '18 at 19:23
  • I have another problem: a) for every table we have ~6 classes in Java (DTOs, Repository, Builder... and so on) b) and we do it with an own code generator c) it is too easy to add new generators for every table d) we have already 600 tables ~ 3600 generated classes (+ non-generated) the question is: when will we have problems with the JVM (permsize, heap...)? if any Note: clean code, dry, and engineering things are not in the question. Thanks you. – César Mora Oct 27 '21 at 06:10
7

Generally alot of classes means you are likely to have solved your problems very generally. This is usually good since it means you hopefully will have an easier time changing behaviours when you eventually need to.

When developing smaller projects sometimes it can be better to be more specific (i.e. less general) to achieve things faster this could lead to less classes, but might be harder to change when the need appears.

As long as the classes are well ordered and have a well defined purpose it should not be a problem to have many classes.

What can how ever be a problem is if classes are tightly coupled or if the responsibility of some classes are not well defined. More info about coupling can be found here.

Another problem that can occour is mentioned in the comment below. If a lot of your classes have similar code you have a Duplication problem. This usually leads to a decreased maintainability in the system becuse if a change is needed in the duplicated code you have to make changes multiple times. This is usually solved by inheritance.

Pablo Jomer
  • 9,870
  • 11
  • 54
  • 102
  • 2
    I agree there. However you could have a situation where you have a lot of classes that just repeat themselves, because inheritance isn't used properly. There are tools out there to measure code duplication. If you have a lot of classes with low code duplication I agree that would be a good sign. – Dan Oct 14 '12 at 13:28
  • 1
    That is true what Dan is talking about is called code duplication. This is usually a symptom of to non generalized code and is one of the reasons why copy paste coding is not a good idea. Surely structural elements of code can be copied and change. How ever if you find your self copying code as it is perhaps you should concider why you are copying this. Duplicate code requires about the double time to maintain for quite obvious reasons. – Pablo Jomer Mar 13 '13 at 17:35
6

Kent Beck answers your question. Jeff Langr in the book 'Clean Code A Handbook of Agile Software Craftsmanship' discusses the four design rules as specified by Kent Beck.

(in order of importance)

  1. Runs all the tests
  2. Contains no duplication
  3. Expresses the intent of the programmer
  4. Minimizes the number of classes and methods

Kent suggests that a pragmatic approach be adopted to keep the class and method count low. He gives the example of adhearing to dogmatic rules such as all classes should have interfaces. Usually yes but sometimes there are situations where this might not be necessary. However, this rule is the lowest priority of the four rules of simple design.

(note, this is Kent Becks opinion, not so much mine!)

atreeon
  • 21,799
  • 13
  • 85
  • 104
  • 1
    Runs all of the tests should mean it works (passes all tests comes in second). He implies one of many methods of determining something works (automate testing) when simply what you which to determine should be stated, not specifically how. Number 3 is extremely subjective although in principle it's correct. There's a potential conflict with number 3 as it can lead to bloat. However #1 and #4 should guard against that. – jgmjgm Jan 24 '18 at 18:28
  • OP's question pertains to *css classes*, Kent Beck's design rule pertains to *coding construct classes*. – matt lohkamp Jan 31 '20 at 18:17
  • Hi @atreeon, even after 4 years, I'm curious: why did you write "(note, this is Kent Becks opinion, not so much mine!)"? Is that about the last sentence "However, this rule is the lowest priority of the four rules of simple design."? Thanks – Sixro Aug 03 '20 at 12:27
3

In a project I'm currently working on I definitely think we're using too many classes - or at least, too many objects/instances.

We built a CMS based on PHP/MySQL where every record and field in the database is represented as an object in PHP. This can result in tens of thousands of instances at the same time and we're continually running into performance issues/running out of memory, etc.

This may of course not be an issue in other programming languages or with other requirements, but performance is, in my opinion, something to consider as well.

Florian Kernler
  • 149
  • 1
  • 9
  • This is unrelated to the ops question though. As you acknowledged its about too many (and therefore probably rather small) classes not instances. – RecursiveExceptionException Jun 27 '16 at 13:11
  • It is very common to see people fetch an assoc from the database, convert it to an object representing that table, setting each property one by one then doing the reverse when sending the item to Javascript. This is a common anti-pattern. You should avoid tampering or changing data as much as possible. If there's no reason don't do it. I have often converted thousands of objects and tens of thousands of lines of code to simply json_encode. The problem is the majority of devs are young, fresh out of academia and drowned in pure OOP so it can be difficult to raise this case. – jgmjgm Jan 24 '18 at 18:41
2

As many others have suggested, "it depends..."

Usually it depends on a combination of your methods, your goals and the preferences and abilities of your team members. If you are very stringent about unit tests, you probably end up with a lot of small, general classes and dependency injection. This also means that it's very hard for the individual team member to see the concrete whole you are building from all the parts that are so very, very generic.

Personally I prefer to think in terms of an API built on two levels: A lower level made of generic, independent parts and a high level where I use a couple of facades, directors etc. to present something concrete and useful to the other coders. This is much like the design of the iOS libraries IMHO.

Anders Johansen
  • 10,165
  • 7
  • 35
  • 52
2

Philosophical:

There is such a thing as too much of anything. You can have too many and too little classes. Some like to pretend that more is free because they can use tools like search as an excuse for creating an excessively messy, difficult to navigate and large search space. In the long run you'll always find a measurable deficit.

I'm not really sure there's an upper limit on how many classes you can have. You can find ways to telescope things adding classes indefinitely, technically speaking, infinitely. If you can't have too many classes but you can add them infinitely then you would never finish your program on account of the number of classes you seek to have, therefore you can have too many classes.

A number of IDE's make it very easy to create lots of classes and string them together with things like boilerplate generation, auto-completion and there's always copypasta. Many tools decrease the cost of creation of what is essentially often useless code but don't decrease the cost of bloat as much. Even with helpers, unbloated code will always work out cheaper than bloated (you can only reduce the tax, not eliminate it). If you don't care about it, it will eventually become a problem. Even if you have things like code scanning, fine and replace in file, etc then ten times more is still ten times more. It means ten times more to change, ten times more to go wrong and a tenth of the amount of effort expendable per line.

Many people fall into the trap of thinking they are reducing complexity by adding more classes. In fact they tend to simply be breaking complexity up, moving things away from the things they are related to and adding layers of complexity in the form of indirection. Linear code becomes,

non-linear unnecessarily (that's an example of too many paragraphs, although to be fair a better example might be one paragraph per sentence or word as being too many, when your paragraphs become sentences then you don't really have two separate things anymore, that's probably proof of two many paragraphs, when sentences stop being a different thing to paragraphs).

Detection:

The simple way to look at this is if you have the path A (single node/one function/class/etc) but break it up into A->B you didn't actually gain anything. You just took one sheet of paper, tore it in two, put it in two envelopes and then posted it to its destination. If it turns out that you actually truly need a node with more than one edge then you gain something. That would be A->B, A->C for example. You can use graph analysis to sniff out too many objects. If you have big long linked lists or many small linked lists (or perhaps even a few at all) then you can probably say you've got too many classes. Not all forms of object excesses are so easily detected. With too many classes maintenance becomes excessively complex as you end up supporting a level of flexibility and a model of which only a fraction you only use. It means that a lot of your code isn't actually corresponding to what needs to be done. This alone makes it hard to maintain as the objective of that code is subjective rather than objective, it might as well be arbitrary.

You can take a codebase and reduce the number of classes until you only have those actually needed. This means only those needed for portability (passing data around), variances in how things should behave (passing methods around) and as is needed for reasonable separation (handling major concepts independently like persistence, presentation, etc) and as is needed for deduplication. When not able to work to a good design many programmers will do it backwards, writing the code and only splitting it where needed to serve a specific purpose on demand.

Although measurable, there isn't an exact measure of too many classes nor a perfect sign. There are only hints. A large ratio between the minimum number of classes needed and the maximum for example is a hint. What is large? I would say a 100 times is definitely suspect, 10 times fairly suspect, 5 times slightly suspect. This can change though based on a number of parameters.

A strange measure is to gzip your code. The better the compression ratio the greater a chance of bloat. This is not a perfect measure though as it needs points of reference. Certain ways to reduce compression ratio can also be unproductive, coding to a specific number naively is never going to work.

You can know if you have to many classes (or interfaces) if they're making you do work that doesn't genuinely help you get to your ultimate goal or if they are slowing you down more than they are speeding things up. This can be subjective though. If someone has made too many classes, that means they're going to have to change their habits which means there's usually an entry fee for better approaches to coding. At the beginning of a project this is hard to detect as adding code is usually very cheap. Nothing yet depends on it so much, layers are shallow, etc. It's not at least until many months or even a year into a project that bloat, poor organisation, etc that the cost become apparent. It's not until a project becomes practically deadlocked that people take notice. Many people wont know either if something that takes a year should have really taken a year or six months. There's rarely much point of comparison.

If you look at your classes you can pick some things up. How much of the code is OO and how much is FO (object orientated versus Functionality Orientated)?

Functionality Oriented means code that actually does something and directly contributes to the end result. This comprises your necessary code. It will likely consist of operators beyond assignment and casting. Usually conditional statements, branches, reading data, choosing a course of action and taking appropriate courses of action such as data generation, mutation or fetching/storing against IO.

Object Oriented means simply representing concepts using classes. This code almost turns your procedural code into a declarative language. If most of your classes are simply in aid of heavy type checking, representation, etc then you might have too much. Signs of that are classes and methods with parts of their names that could be variables allowing you to shrink those classes. A very strong sign of this is if most of what these classes are doing is boxing. Simply assigning variables but not doing much of anything else. This is really a case of having excessive structs and usually a lack of deduplication, dynamic coding or abstraction.

In the obvious case, if a class is never used at all then it's a class too many (if its dead code). Things like classes that are identical but with different names are also a good sign.

Causes:

This can be driven by a number of things asides from mechanisms that make it very easy to create and connect things together (that tend to break when you abstract and do things dynamically encouraging avoidance of good habits that supersede but break the IDE). I have often gotten trapped by this by trying to represent everything absolutely perfectly, however OO is actually not flexible enough to do this and it often turns out to be YAGNI. A common problem is simply lack of abstraction where you usually have variable unrolling into top level language constructs as mention before (which also links into a design to expose everything to the IDE directly). This might not only betray the lack of dynamic coding but alternatively the use of a preprocessor or similar. What this will look like is basically a tree with all the leaves defined. As much as possible with classes you want to avoid having to define a class for every single possible leaf. A sign of tree expansion taken to an extreme might be where you have a class for every possible usage of a primitive. This can also be a sign of something more extreme. The unrolled Cartesian Product of all classes. In this case you don't simple get a class for a Leg but a class for a CatLeg, DogLeg, etc when there's usually no actual variance between the two. Some people can do this out of extremism of type checking to stop someone putting a DogLeg on a CatLeg. It's a nasty and common anti-pattern.

One of the biggest drivers of too many classes is an attempt to adhere to standards from out there in the cloud that don't really apply to your situation. In this case you're not programming in response to your problem. You're programming in response to other people's problems.

This is very common with things such as SOLID. It's very important to know and understand principles such as SOLID, being able to apply them but it's also important to know when not to apply them.

This principle is heavily used when OOP languages with large libraries are taught. If you are making an OOP library that you wish to distribute to the world, potentially millions of people with every conceivable use case then you want to adhere to OOP principles that lead to a lot of breaking things up into make interfaces and classes so that they can be used in different ways and so that one piece of functionality doesn't have a high chance to pull in another that might not be needed. You have to consider that in this world, you don't want to create libraries people might have to fork. People don't want to do that as they then become maintainers of code they're reusing that would otherwise have very lost total cost of ownership.

Those principles also add a hell of a lot of overhead though and if your codebase has a limited user scope, is fully under your control, etc then you probably have too much code if you're doing it the "way you should" for distributed code. Even if you do have code that is distributed, sometimes it can be too extreme to cater for all use cases beforehand, you sometimes have to work out what's most likely to be needed and then everything else would be changed on demand. For a small library you can afford to put in a hell of a lot of extra work. For a large code base you have to workout where overhead will most likely pay for itself.

Counter example:

In an ideal world, you code minimalistically and only according to your immediate needs. There is a bias where this method is superior on account of excesses are not self revealing. Deficiencies are. If you have too little it will directly present itself. This is very common to see in DRY. After adding one function you add another. You copy and paste the first then change the bottom half. The common top halves of the two functions are duplicate code, it is immediately self revealing that they need to be deduplicated. You do so by creating a third function. You know that's not one function too many as you have an objective provable reason to create it. This approach becomes more difficult when writing code for use by others. By others I don't necessarily mean anyone. I mean those without direct access to the codebase, usually strangers. Essentially people who can't easily/quickly/cheaply break up your code when needed. If you don't cater to such an audience, then you don't need to worry about breaking up your code prematurely.

I recently used a library online with too few classes. It contained a single class with multiple responsibilities. It would take a file handle (as a primitive type) to write to then also automatically output HTTP headers appropriate to the stream it was generating based on the methods called (such as addImage, addText, etc).

In an ideal world this class should not have made assumptions about output. Uses might wish to output to the file system, memory, a TCP stream, etc. It need only offer an interface with a simple write method or use one from a standard library. In my case I only needed to have it output via string concatenation but to achieve this I had to open a pseudo file mapping into memory (which wouldn't normally be possible but the language allows it as a hack).

I've had this problem a number of times using random libraries from all sources. In some cases it's going to be obvious where separation should be applied and sometimes it's not. If in doubt, too little still beats too much since you're guaranteed to find out about it eventually. I tend to observe that if you add anything you're not really sure about you'll end up in significant bloat territory. If you do it once you'll probably do it twice and then that becomes a habit.

jgmjgm
  • 4,240
  • 1
  • 25
  • 18
1

It's all depend on your project. its depend on your requirements.
classes must be minimum in the way that, there is no unwanted classes
classes must be maximum in the sense that, they all have contain there attributes separately.

Ravindra Bagale
  • 17,226
  • 9
  • 43
  • 70
1

An application can all be in one code file or each atomized function can be in its own file, the only thing affected is maintainabiliy. Maintainability can mean your own ability to navigate the code, or it can mean how others can understand the code, or if it is possible to build new releases.

I don't think there are any generic guidelines on this that always apply, it depends on so many things. E.g., when coding in JavaScript you typically use fewer (and bigger) files which contain more unrelated functionality than if you are coding in C# or C++.

If you are using Visual Studio 2012 then http://msdn.microsoft.com/en-us/library/bb385910.aspx and http://msdn.microsoft.com/en-us/library/dd264939.aspx has information about how Code Metrics and Code Analysis works.

This is an example of a report from Code Metrics in Visual Studio 2012 based on my own app, the values are explained at http://msdn.microsoft.com/en-us/library/bb385914.aspx.

Project: <<Projectname>>
Configuration: Debug
Scope: Project
Assembly: <<Path>>
Maintainability Index: 84
Cyclomatic Complexity: 479
Depth of Inheritance: 8
Class Coupling: 189
Lines of Code: 903
Mattias Lindberg
  • 2,064
  • 3
  • 24
  • 34
1

I think it depends which area is having large number of classes. If there are many static classes containing common business logic, this will be considered bad as static classes should be used only for common helper methods. Static classes should never contain common business logic.

If there are different classes for different layers for holding essentially same data. This will be considered bad as DTO classes should not be duplicated across layers.

However if classes have been created after proper decomposition of requirements, then I think it is actually good to have large number of classes.

jags
  • 2,022
  • 26
  • 34
  • That's no difference between business logic or helpers. The decider is if you need the things instances give you that procedural doesn't. Business logic wants to be isolated but whether or not it needs to be instantiated is purely a matter of programmatic concerns, not what the code relates to. If all things are done appropriately, it's no big deal to convert procedural business logic into polymorphic if the actual need arises. Exceptions to this arise when your language or methodology would make it hard to change all of the code that depends on that logic. – jgmjgm Jan 24 '18 at 18:36