2

Robin C. Martin's "Clean Code" features a section on Procedural Programming vs Objected Oriented Programming which makes a few statements that I was not able to wrap my head around. Could someone care to explain in further detail the thought process behind the statements?

The statements are the following:

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.

The complement is also true:

Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change.

The code example used to justify this statement is the following.

Example of Procedural Code:

public class Square {
  public Point topLeft;
  public double side;
}

public class Rectangle {
  public Point topLeft;
  public double height;
  public double width;
}

public class Circle {
  public Point center;
  public double radius;
}

public class Geometry {
  public final double PI = 3.14;
  
  public double area(Object shape) throws NoSuchShapeException {
    if (shape instanceof Square) {
      Square s = (Square) shape;
      return s.side * s.side;
    }
    else if (shape instanceof Rectangle) {
      Rectangle r = (Rectangle) shape;
      return s.height * s.width;
    }
    else if (shape instanceof Circle) {
      Circle c = (Circle) shape;
      return PI * c.radius * c.radius;
    }
    throw new NoSuchShapeException();
  }
}

Now, let's pause for a second here. Let's assume that Geometry doesn't calculate just areas, but also perimeters. Taking the above example as is, it is obvious that:

  • Adding a new Shape will force all the functions (area, perimiter, etc) in geometry to change. No existing data structures need changes though, they are just containers of data.
  • Adding a new function (such as perimeter) to Geometry is easy instead.

Thus, Procedural makes it easy to add new functions, but hard to add new data structures.

But let's consider the above example written in the quintessential procedural language of all, C. With C we could not write a function such as Geometry's area taking an abstract data type. We instead would have the following method signatures:

int geometry_area(square s)
int geometry_area(rectangle r)
int geometry_area(circle c)

int geometry_perimeter(square s)
int geometry_perimeter(rectangle r)
int geometry_perimeter(circle c)

Considering this, adding a new shape does not force any function to change, contradicting what the author has mentioned. Instead, we will have to implement all behaviours for the new shape each in its isolated function.

Adding a new behaviour instead, will force us to write functions for all existing shapes.

So, overall, I'm a bit confused about the logic behind the author's reasoning.

The author concludes with the following:

In any complex system there are going to be times when we want to add new data types rather than new functions. For these cases objected and OO are most appropriate. On the other hand, there will also be times when we'll want to add new functions as opposed to data types. In that case procedural code and data structures will be more appropriate.

Miguel Pais
  • 73
  • 1
  • 5
  • 1
    Martin is describing what's known as [The Expression Problem](https://stackoverflow.com/questions/3596366/what-is-the-expression-problem). – jaco0646 Jan 27 '21 at 15:15
  • This link was exactly the kind of lecture I was looking for in order to further understand these considerations behind easy and hard extensibility of types and operations. If you post this as an answer I'll mark it as correct. – Miguel Pais Jan 28 '21 at 08:42

1 Answers1

1

You are talking about chapter 6 - Objects and Data Structures, subsection Data/Object Anti-Symmetry.

I think the approach in the book uses a OO-Language deliberately to show this 'Anti-Symmetry' as he calls it. You are trying to follow his arguments using the procedural language C.

You can reconstruct the example in the book with C.

#include <stdlib.h>
#include <math.h>
#include <stdio.h>
#include <memory.h>

#define M_PI 3.14159265358979323846

typedef enum {
    RECTANGLE   = 0x0,
    CIRCLE      = 0x1
} geometry_type_t;


typedef struct {
    geometry_type_t type;
} geometry_t;

//radius in millimeters to avoid floats
typedef struct {
    int radius;
} circle_t;

//width and height in millimeters to avoid floats
typedef struct {
    int width;
    int height;
} rectangle_t;

int geometry_area(geometry_t* geometry) {
    if (geometry->type == RECTANGLE) {
        rectangle_t* rect = (rectangle_t*)(geometry + 1);
        return (rect->height * rect->width);
    }
    else if (geometry->type == CIRCLE) {
        circle_t* circle = (circle_t*)(geometry + 1);
        //I don't care about floatings in this example
        return (circle->radius * circle->radius * M_PI);
    }
}

int main(int argc, char* argv[]) {
    circle_t c;
    c.radius = 2000; //millimeters

    geometry_t* geom = malloc(sizeof(geometry_t) + sizeof(circle_t));
    if (geom == NULL) {
        return EXIT_FAILURE;
    }
    geom->type = CIRCLE;

    memcpy((geom + 1), &c, sizeof(circle_t));
    printf("Area of circle with radius %d: %d\n", c.radius, geometry_area(geom));

    rectangle_t r;
    r.height = 1000;
    r.width = 2000;
    geom->type = RECTANGLE;
    memcpy((geom + 1), &r, sizeof(rectangle_t));
    printf("Area of rectangle with width %d and height %d: %d", r.width, r.height, geometry_area(geom));
}

If you follow the approach that you mentioned, with individual functions for each shape the phrase from the book:

... if I add a new shape, I must change all the functions in Geometry to deal with it

Could be rephrased to something like:

adding a new shape forces you to reimplement any associated function for that specific shape

And the statement:

... if a perimeter() function were added to Geometry. The shape classes would be unaffected

Could be rephrased to something like:

adding a new function for multiple structs needs to be done individually for each struct

In my opinion the explanation of the author is only meaningful when using some kind of abstraction.

ag00se
  • 116
  • 1
  • 10
  • 1
    My question is, is the above snippet how a professional C developer would go about in implementing the logic? As you say, it's introducing a type abstraction and that to me means immediately that the developer is trying to introduce OOP in C. – Miguel Pais Jan 27 '21 at 13:37
  • 1
    As I would not describe myself as a professional C developer I can only state my humble opinion. I think it depends on the actual logic. If I am implementing something that has a fixed number of possible 'instances', which are highly unlikely to change like for example the http methods I would implement it the way you described it with individual functions (handle(http_get_t), handle(http_post_t.), ...) If I am implementing something that is still in a development process and 'instances' need to be added frequently I would most likely take the abstract approach. – ag00se Jan 27 '21 at 14:01