2

I created some data objects for my data in Python. Now, I need to write some algorithms that process this data. I am debating between writing the algorithms as:

(1) Methods of the Data objects. (2) Functions in a separate module. (3) Algorithm objects.

I noticed that in numpy, some algorithms (ex. min, max, sum etc) are implemented as methods of the numpy array object, while other more complex algorithms (ex. svd ) are implemented as functions (in a separate module called numpy.linalg). I also noticed that some people implement algorithms as objects, which act on the data objects ( in the previous case linalg could very well be a class with a method called svd ).

I have an example here to make my question clear. Suppose my data object is called sample and my algorithm is called rasam.

I can implement rasam as a method of my sample object and access it as

this_sample = sample()
this_sample.rasam(rasam_args)

or

this_sample = sample()
rasam(this_sample,rasam_args)

or

this_sample = sample()
this_rasam  = rasam(rasam_args)
this_rasam.run(this_sample)

Which of the above options would be considered a code good design? As a follow up why is the SVD algorithm implemented as a function where as SUM implemented as a method of the numpy array object?

I found this link that sort of addresses this question in a general "functions vs methods" point of view, but I am more interested in this specific context of data objects and algorithms that act on the data.

In Python, when should I use a function instead of a method?

Community
  • 1
  • 1

1 Answers1

0

There is no real correct answer to these kinds of questions, the answer is always "it depends." Personally in this case I would go with both option 1 or 2. Without knowing exactly what your data is its hard to say for sure but basically the main thing that you want to think about is code reuse.

How different are the data objects form each other? Could your algorithm be a one size fits all solution? If so, then you probably want to make them functions in a separate module.

Numpy actually follows option 1 and 2 (as far as I can tell). While the numpy array object does have a sum function, there is also a top level numpy.sum function that operates on an axis of other numpy data structures like matrices.

I'm not a numpy dev but I would guess that the reason they gave the numpy array its own implementation of sum is simply to create a better api for the end user.

That being said, I think that you can't go wrong with just creating the algorithms as their own functions in a separate module. If you have a data structure that really does need its own implementation of the same algorithm then you can add that method to that data structure's class.

Kyle Stuart
  • 111
  • 5