2

I have run into this phenomenon many times before, but none were as clear for demonstrating my point as the following:

The source code for pwhois has two functions:

w_lookup_all_pwhois(whois_session_params * wsess, char *addr)
w_lookup_as_pwhois(whois_session_params * wsess, char *addr)

These are 40-to-60-line functions, and are identical except the ...as_pwhois function only saves the "origin-as" field, whereas ...all_pwhois saves all fields.

If I were writing these, I would instead write a single function, with another variable that says whether to fetch all fields or just a single one. Depending on the application, this variable could even flag, which of the fields to fetch. One advantage would be that, when reading code from scratch, I wouldn't have to read two identical functions (which are not adjacent in the code), to figure out that they do the exact same thing. Also, when changing the functionality of one, I wouldn't have to visit every one of the relevant functions, to modify the code there as well. The disadvantage: a more complicated function-- a seemingly minor disadvantage.

However, most developers in my company seem to prefer the multi-function approach, as seems to be the case everywhere else as well, judging by the available open-source code on the internet. As a result, the pwhois has on the order of 50 functions, and I have to remember which one does what-- when 10 multi-purpose functions could easily do the job. What am I missing, that makes the 50-function approach more preferable? Is there a way to read source code from scratch, in a way which avoids reading these very similar functions more than once? (Since the functions are not adjacent in the code, I would guess maybe there's some "standard" comment file that I've not run into yet.)

Alex
  • 947
  • 1
  • 8
  • 25
  • 'when changing the functionality of one, I wouldn't have to visit every one of the relevant functions, to modify the code there as well' - shoulda been 51 functions. – Martin James Aug 22 '17 at 15:39
  • 4
    Just splitting into more functions isn't the question. It's about breaking a problem down into smaller, useful, and encapsulated pieces. The resulting code is more modular, making it easier to maintain, more flexible, and more re-usable for similar tasks. If the code you're talking about has redundant code in each of the smaller functions that must always be synchronized, that's just poorly written code (poor encapsulation) having nothing to do with it being more functions. Well-written, modular code would limit the amount of code that needs to be considered for a given change. – lurker Aug 22 '17 at 15:39
  • 5
    Readability, maintainability, reusability. – Some programmer dude Aug 22 '17 at 15:40
  • 1
    Look here: https://stackoverflow.com/questions/250284/best-practices-many-small-functions-methods-or-bigger-functions-with-logical-p – Fausto Carvalho Marques Silva Aug 22 '17 at 15:40
  • If you really have identical functions, their size isn't the problem. – Scott Hunter Aug 22 '17 at 15:40
  • 1
    "Hey, don't touch what's already there - make a new function. If you break something that worked OK before, it's your job". – Martin James Aug 22 '17 at 15:41
  • 3
    If the functions are split the way you describe it, this isn't a good practice. *Many small functions* are good, but they should follow the *single responsibility* (SRP) principle as well as the *don't repeat yourself* (DRY) principle. That would mean, take these two functions, factor out the common code in **yet another function**. For the reasons, google for these terms :) –  Aug 22 '17 at 15:46
  • C doesn't support multiple exits easily, so the alternatives are a long sequence of gotos and cleanups, or small functions. The latter is generally more maintainable and easier to understand and refactor. – Kerrek SB Aug 22 '17 at 15:52
  • @FelixPalmen So in this case, is it a good idea to code your suggestion for the `pwhois` database? Or, is it a big enough project, that I should not propose a change of this magnitude until I am really familiar with how it works? – Alex Aug 22 '17 at 15:52
  • Anyway, it's usually not code, but data that it the problem. In C, you can't just pop another member var into a class declaration - you have to explicitly arrange for persistent data to.. well.. be persistent. It's all too easy to resort to globals or statics, and then we get crap like strtok() :* – Martin James Aug 22 '17 at 15:53
  • @Alex: if the code inside these functions is duplicated, then it's an awful idea. – vgru Aug 22 '17 at 16:00
  • @Alex can't tell you, I don't know this project. I'm just saying a complicated *generalist* function typically isn't something you want either. –  Aug 22 '17 at 16:22

2 Answers2

3

Having a multitude of functions that all work together does a number of things.

  • Firstly, it creates more reusable components. Functions that may be included in the operation of the program are accessible not only to the program itself, but also any developers who want to take a look and modify it.
  • Secondly, it breaks down tasks to make them more manageable. If you have one big function, it requires a hell of a lot of code to work harmoniously without fault, but when taking a more modular approach, there is more wiggle room.

Think of a program that is required to do some hard maths. Which approach would you take?

  1. A function which calculates an output based on a fixed formula, one which is almost unique and only has one application. All the mathematical operations are defined privately.
  2. A function which calculates an output based on a fixed formula, one which is almost unique and only has one application. But, all the mathematical operations are broken down and made reusable.

The idea is that when you're working later down the line, or attempting to fix a bug, you never know when a function you've written before might come in handy! This is part of the DRY (Don't Repeat Yourself) approach to programming and I'd recommend reading this article to help reduce your workload and keep your code efficient! :D

Will
  • 291
  • 3
  • 15
  • How is the consistent with DRY though? The code in those functions does seem to repeat itself at this time... – Alex Aug 22 '17 at 15:50
  • @Alex - Its best to remember that no programming practice is perfect. But by using DRY, we can cut down on the amount of keys we have to press to get an output. It also cuts down on file size, which can sometimes be an issue. At times, code will repeat itself with slight variation and that's to be expected, but DRY significantly reduces the excess. – Will Aug 22 '17 at 15:55
  • In this case, if these 50 functions work almost the same way, and contain most of the same functionality copy/pasted 50 times, then it's the wrong approach. – vgru Aug 22 '17 at 16:02
  • @Groo - Precisely. From a metaphorical perspective, its similar to having a software office with two people from every department in each office. Yes, they are all good at what they do, but it would be much more efficient to put all the back-end developers into one office, the front-end in another, the DBAs into the other office and so on. As long as communication between the functional units is strong, then things will move smoother and easier! – Will Aug 22 '17 at 16:05
2

It's hard to say if a design is actually good or bad but I'd say that, generally, functions with lot of parameters that change its behavior altogether are hard to use and understand, specially later for code review and debug. For instance, thing like calling w_lookup_all_pwhois(wsess, addr, 1) (Or imagine worst, with function like foo(param1, param2, true, 0, NULL, true) may be weird. For that reason, it may be better breaking down into several functions with one specific goal (See the Single Responsibility Principle SRP).

However, as you pointed out, it's better to avoid code duplicate. One possibility I'm using is to write internal multi-purpose function (Like private function) hidden to end user. Then create a set of one purpose functions that call it with the right parameters. In your case with pwhois, it would be something like w_lookup_all_pwhois and w_lookup_all_pwhois calling the same 'private' function with specific parameters. It's not always the best solution, though works for several situations. (For instance, sometime it's hard to have this general function as efficient as single-purpose one).

To make a long story short, it helps code re-usability (See Open/Close Principle OCP) and comprehension.

geekymoose
  • 364
  • 2
  • 6
  • Even though this is not as well-written as the other answer-- it does a better job addressing what I asked. – Alex Aug 22 '17 at 16:24