Say I have code such as this:
class BookAnalysis {
final List<ChapterAnalysis> chapterAnalysisList;
}
class ChapterAnalysis {
final double averageLettersPerWord;
final int stylisticMark;
final int wordCount;
// ... 20 more fields
}
interface BookAnalysisMaker {
BookAnalysis make(String text);
}
class BookAnalysisMakerImpl implements BookAnalysisMaker {
public BookAnalysis make(String text) {
String[] chaptersArr = splitIntoChapters(text);
List<ChapterAnalysis> chapterAnalysisList = new ArrayList<>();
for(String chapterStr: chaptersArr) {
ChapterAnalysis chapter = processChapter(chapterStr);
chapterAnalysisList.add(chapter);
}
BookAnalysis book = new BookAnalysis(chapters);
}
private ChapterAnalysis processChapter(String chapterStr) {
// Prepare
int letterCount = countLetters(chapterStr);
int wordCount = countWords(chapterStr);
// ... and 20 more
// Calculate
double averageLettersPerWord = letterCount / wordCount;
int stylisticMark = complexSytlisticAppraising(letterCount, wordCount);
HumorEvaluation humorEvaluation = evaluateHumor(letterCount, stylisticMark);
// ... and 20 more
// Return
return new ChapterAnalysis(averageLettersPerWord, stylisticMark, wordCount, ...);
}
}
In my particular case, I have one more level of nesting (think BookAnalysis -> ChapterAnalysis -> SectionAnalysis) and several more classes on both ChapterAnalysis (think PageAnalysis that each chapter spans) and SectionAnalysis (think FootnotesAnalysis and such) levels. I have a dilemma about how to structure this. The problem is that in processChapter
method:
- Both preparation and calculation steps take non-negligible amount of time / resources
- Calculation steps depend on multiple preparation steps
Some concerns:
- The above class, taking into account there are say 20 fields in ChapterAnalysis would be quite long
- Testing the whole one would require a hugely complex preparation method that would test a humongous amount of code. It would be unbearably hard to confirm that e.g.
countLetters
works as expected, I'd have to unnecessarily replicate almost the whole input just to test two different cases wherecountLetters
behaves differently
Solutions to contain complexity and allow for testabilty:
- Split
processChapter
into private methods, but then cannot / should not test them - Splitting into multiple classes, but then I'd need a lot of helper data classes (for each method in calculation phase) or one big kitchen sink (holding all the data from preparation phase)
- Making helper methods package private. While that solves the testing issue in the sense that I can test them, the "I should not" part still applies
Any hints, especially from similar real-world experiences?
Edit: updated naming and added some clarifications based on the current answers.
My main concern with splitting into classes is that it is not linear / one-level. For example, above countLetters
produces results that are needed by complexSytlisticAppraising
. Let's say it makes sense to make separate classes for both of these (LetterCounter
and ComplexSytlisticAppraiser
). Now I have to make separate beans for the input of ComplexSytlisticAppraiser.appraise
, i.e. something like:
class ComplexSytlisticAppraiserInput {
final int letterCount;
final int wordCount;
// ... 10 more things it might need
}
Which is fine, except that now I have HumorEvaluator
for which I need this:
class HumorEvaluatorInput {
final int letterCount;
final int stylisticMark;
// ... 5 more things it might need
}
While this might be done just by listing parameters in many cases, a big issue is return parameters. Even when I have to return two ints, I have to make a separate bean that has those two ints, constructor, equals / hashCode, getters.
class HumorEvaluatorOutput {
final int letterCount;
final int stylisticMark;
public HumorEvaluatorOutput(int letterCount, int stylisticMark) {
this.letterCount = letterCount;
this.stylisticMark = stylisticMark;
}
public int getLetterCount() {
return this.letterCount;
}
public int getStylisticMark() {
return this.stylisticMark;
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("HumorEvaluatorOutput [letterCount=");
sb.append(letterCount);
sb.append(", stylisticMark=");
sb.append(stylisticMark);
sb.append("]");
return sb.toString();
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + letterCount;
result = prime * result + stylisticMark;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
HumorEvaluatorOutput other = (HumorEvaluatorOutput) obj;
if (letterCount != other.letterCount)
return false;
if (stylisticMark != other.stylisticMark)
return false;
return true;
}
}
That's 2 vs 53 lines of code - yikes!
So all this is fine, but it:
- Is not reusable. Vast majority of these will be used only to make the code testable. Think about analyzers such as:
BookAnalyzer
,CarAnalyzer
,GrainAnalyzer
,ToothAnalyzer
. They share absolutely nothing in common - Making 20 classes out of 1 would not yield much except allow for testing
- You can argue that whether it's split into classes or methods, the differene from the point of making the parts small enough to be understood and manipulated is not that big
- On the other hand, there will be a significant amount of noise and indirection added if I were to go proper OOP with testability in mind. Compare:
- Manage 10 files = 10 analyzers * 1 files with 20 private methods
- 800 files = 10 analyzers * (20 interfaces, 20 implementions, 20 input and 20 output beans)
- 400 files if we remove input / output beans and go some other route (such as one big I/O bean per analyzer hack) Note that hundreds of files are going to be very short, mostly boilerplate - probably majority of logic will be under 10 lines each (== private method in the first case)
- There's significant overhead in this. If I were to call a private method 1m times, creating additional input and output beans is adding up...
Probably doing the right thing is, well, doing the right thing. Just wanted to see if there are some other options I can pursue that I'm missing. Or is my logic just purely bad?
Edit: Additional update based on the comments. We can make HumorEvaluatorOutput
shorter, not a huge issue:
class HumorEvaluatorOutput {
final HumorCategoryEnum humorCategory;
final int humorousWordsCount;
public HumorEvaluatorOutput(HumorCategoryEnum humorCategory, int humorousWordsCount) {
this.humorCategory = humorCategory;
this.humorousWordsCount = humorousWordsCount;
}
public HumorCategoryEnum getHumorCategory() {
return this.humorCategory;
}
public int getHumorousWordsCount() {
return this.humorousWordsCount;
}
}
That's 2 vs 17 lines of code - still yikes! It's not much when you consider a single example. When you have 20 different analyzers (BookAnalyzer
, CarAnalyzer
, ...) with 20 different sub-analyzers (for Book as above: ComplexSytlisticAppraiser
and HumorEvaluator
and similar for all other analyzers, obviously much different categories), the 8 fold increase in code adds up.
As for BookAnalyzer
vs CarAnalyzer
and Book
vs Chapter
sub-analyzers - actually, I need to compare BookAnalyzer
vs CarAnalyzer
, as that's what I will have. I will definitely reuse Chapter sub-analyzer for all chapters. I will however not re-use it for any other analyzers. I.e. I'll have this:
BookAnalyzer
ChapterSubAnalyzer
HumorSubAnalyzer
... // 25 more
CarAnalyzer
EngineSubAnalyzer
DrivertrainSubAnalyzer
... // 15 more
GrainAnalyzer
LiquidContentSubAnalyzer
FiberContentSubAnalyzer
... // 20 more
Going by the above, instead of 1 class per analyzer, I now have to create 20 interfaces, 20 extremely short sub-classes with 20 input / output beans and none will ever be re-used. Analyzing Books and Cars is rarely using the same approach and same steps anywhere in the process.
Again - I'm fine with doing the above, but I just don't see any benefit except to allow for testing. It's like driving Toyota Thundra to your next-door neighbor's party. Can you do it just to be the same as all other people coming to the party? Certainly. Should you do it? Ehhh...
So:
- Is it really better to make 500 lines in 10 files into 5000 lines in 800 files (probably not the fully correct numbers, but you get the point) just to follow OOP and enable for testing?
- If not, how are other people doing it and still keeping on the side of not breaking OOP / testing "rules" (e.g. by using reflection to test private methods which should not be tested in the first place)?
- If yes and everyone else is doing it like that, fine. Actually, a sub-question then - how do you manage to find what you need there and follow the flow of the app in all that noise?