336

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.

To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?

valignatev
  • 6,020
  • 8
  • 37
  • 61
metersk
  • 11,803
  • 21
  • 63
  • 100
  • 36
    there is nothing wrong to code with no classes if you can manage your code nicer. OOP programmers tend to exaggerate the problems due to the constraints from the language design or superficial understanding of different patterns. – Jason Hu Oct 12 '15 at 03:37

6 Answers6

279

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.

First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.

However, there are a lot of reasons to use OOP.

Some reasons:

  • Organization: OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.

  • State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.

  • Encapsulation: With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.

  • Inheritance: Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.

  • Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).

Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.

An example of the student use case (no guarantee on code quality, just an example):

Object Oriented

class Student(object):
    def __init__(self, name, age, gender, level, grades=None):
        self.name = name
        self.age = age
        self.gender = gender
        self.level = level
        self.grades = grades or {}

    def setGrade(self, course, grade):
        self.grades[course] = grade

    def getGrade(self, course):
        return self.grades[course]

    def getGPA(self):
        return sum(self.grades.values())/len(self.grades)

# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})

# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())

Standard Dict

def calculateGPA(gradeDict):
    return sum(gradeDict.values())/len(gradeDict)

students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}

students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}

# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))
Alex
  • 5,759
  • 1
  • 32
  • 47
dantiston
  • 5,161
  • 2
  • 26
  • 30
  • Because of "yield" Python encapsulation is often cleaner with generators and context managers than with classes. – Dmitry Rubanovich Oct 12 '15 at 04:29
  • I often find my self using very complex, arbitrary data storage mechanisms mentioned in your "state " section. Do you have examples of how to avoid this with classes? – metersk Oct 12 '15 at 04:29
  • 8
    @meter I added an example. I hope it helps. The note here is that instead of having to rely on the keys of your dicts having the correct name, the Python interpreter makes this constraint for you if you mess up and forces you to use defined methods (though not defined fields (though Java and other OOP languages don't let you define fields outside of classes like Python)). – dantiston Oct 12 '15 at 05:34
  • 6
    @meter also, as an example of encapsulation: let's say today this implementation is fine because I only need to get the GPA for 50,000 students at my university once a term. Now tomorrow we get a grant and need to give the current GPA of every student every second (of course, nobody would ask for this, but just to make it computationally challenging). We could then "memoize" the GPA and only calculate it when it changes (for instance, by setting a variable in the setGrade method), other return a cached version. The user still uses getGPA() but the implementation has changed. – dantiston Oct 12 '15 at 05:37
  • 6
    @dantiston, this example needs collections.namedtuple. You can create a new type Student = collections.namedtuple("Student", "name, age, gender, level, grades"). And then you can create instances john = Student("John", 12, "male", grades = {'math':3.5}, level = 6). Notice that you use both positional and named arguments just as you would with creating of a class. This is a data type that's already implemented for you in Python. You can then refer to john[0] or john.name to get the 1st element of the tuple. You can get john's grades as john.grades.values() now. And it's already done for you. – Dmitry Rubanovich Oct 12 '15 at 09:11
  • @DmitryRubanovich that may have been a better first choice over nested dicts, but the OP seemed to be used nested dicts so I went with that. I don't think they provide a major contrast over nested dicts. The key distinction that classes have is that the procedure is defined alongside the data, which is my point. I wouldn't use a namedtuple here, personally. – dantiston Oct 12 '15 at 22:52
  • @dantiston, marrying functions to the data on which they act is an anti-pattern in python. See https://docs.python.org/2/glossary.html#term-duck-typing. It has a place, but only when it's the natural thing to do. It shouldn't be the 1st thing to reach for (as it is in Java). For example, if you want to implement a new numeric type, it's natural to implement operators on it. – Dmitry Rubanovich Oct 12 '15 at 23:00
  • @dmitryrubanovich whatever you think of OOP in Python, Python supports OOP, and if the OP wants to use it, they should use it correctly. I'm not sure how duck typing has anything to do with it. If anything, correctly encapsulating classes removes the need for type checking. If you're thinking of defining classes only to give names to data structures, then yes, this can be a bad practice in Python (see valentjedi's video). But if you are actually defining procedure and data together, then you should absolutely use a class. – dantiston Oct 12 '15 at 23:44
  • @dantiston, "defining data and procedures together" is the answer to the question "how?". The OP asked the question "why?" -- not "how?". Both of the examples of code above take 20 lines to say what can be said more clearly with 5 lines of actual Python. The 1st example looks like Java. The 2nd looks like Perl. Both examples ignore best Python practices – Dmitry Rubanovich Oct 13 '15 at 00:21
  • 3
    for me encapsulation is a good enough reason to always use OOP. I struggle to see value is NOT using OOP for any reasonably sized coding project. I guess I need answers to the reverse question :) – San Jay Oct 31 '18 at 20:15
  • why repeat `students[john]` and `students[jane]`, instead of assigning them to variables? – Lei Yang Dec 29 '22 at 05:58
  • The first sentence is an overstatement. You can do [OOP without classes](https://en.wikipedia.org/wiki/Prototype-based_programming) at all. – chepner Jan 04 '23 at 14:13
  • @chepner The first sentence is a statement about classes, not about OOP. But, I appreciate the call out to prototype-based programming :-) – dantiston Jan 05 '23 at 18:19
32

Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state.

If you want to override any of the standard operators, you need a class.

Whenever you have a use for a Visitor pattern, you'll need classes. Every other design pattern can be accomplished more effectively and cleanly with generators, context managers (which are also better implemented as generators than as classes) and POD types (dictionaries, lists and tuples, etc.).

If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner.

If you want to extend functionality, you will almost always be able to accomplish it with containment rather than inheritance.

As every rule, this has an exception. If you want to encapsulate functionality quickly (ie, write test code rather than library-level reusable code), you can encapsulate the state in a class. It will be simple and won't need to be reusable.

If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers.

Dmitry Rubanovich
  • 2,471
  • 19
  • 27
  • really good answer, I would love to read a blog post or something if you have more to add – madhukar93 Oct 31 '17 at 10:55
  • "Whenever you need to maintain a state of your functions and it cannot be accomplished with generators (functions which yield rather than return). Generators maintain their own state" – this is not a particularly convincing reason, because you've got closures for that (which are cleaner and more concise than classes). – Eli Korvigo Jan 05 '18 at 12:08
  • @Eli Korvigo, and closures are implemented as generators in Python. So I think we actually agree there. – Dmitry Rubanovich Jan 06 '18 at 22:29
  • 1
    @DmitryRubanovich closures are not implemented via generators in Python. – Eli Korvigo Jan 06 '18 at 22:31
  • @Eli Korvigo, if you narrow the definition of a "closure" to just the functions defined in the context of other functions, then no. But I am using a more general definition of closures as functions which maintain some state implicitly. I do get your point though. You can always edit the answer or give your own if you think it isn't full. – Dmitry Rubanovich Jan 06 '18 at 22:39
  • 1
    @DmitryRubanovich I was referring to "closures are implemented as generators in Python", which is not true. Closures are far more flexible. Generators are bound to return a `Generator` instance (a special iterator), while closures can have any signature. You can basically avoid classes most of the time by creating closures. And closures are not merely "functions defined in the context of other functions". – Eli Korvigo Jan 06 '18 at 22:45
  • @Eli Korvigo, generator instances are reentrant functions. Because the functions defined in other functions have to track containing context, there is less locality of reference than there is in generators. Personally, I prefer the generators for encapsulating control of data flow over closures. Generators just seem like a better improvement over classes for the situations where generators can be used. Let's put it this way, code which uses closures like JS does instead of classes will be less clean than the "normal" Python code using generators. – Dmitry Rubanovich Jan 06 '18 at 23:01
  • @DmitryRubanovich I highly disagree. Calling a generator function gives you an instance of the `Generator` ABC, which is not callable itself. There are several downsides to this: 1) you can only call `next` to get a value out of this instance, 2) you can't pass arguments to this instance, i.e. it is not a `Callable`, 3) its inner state mutates whether you want it or not. Generators are only cleaner for generating recurrent sequences – they shouldn't (and can't) be used as substitutions for general purpose stateful callables and for encapsulation. – Eli Korvigo Jan 06 '18 at 23:27
  • @Eli Korvigo, in most use cases you get values out of generators with "in" operator rather than by calling '.next()' on it. While you make a few other statements, I don't want to address them individually. They are all true and they are all irrelevant. Generators are certainly not only useful for generating recurring sequences. For example, (as I mentioned in the answer) most of the time context managers are implemented as generators (see ex 1 in https://www.python.org/dev/peps/pep-0343/#examples). – Dmitry Rubanovich Jan 06 '18 at 23:35
  • 3
    @Eli Korvigo, in fact, generators are a significant leap syntactically. They create an abstraction of a queue in the same way that functions are abstractions of a stack. And most data flow can pieced together from the stack/queue primitives. – Dmitry Rubanovich Jan 06 '18 at 23:36
  • @DmitryRubanovich the `in` operator on generators (and any other iterator) is implemented using `next`. – Eli Korvigo Jan 06 '18 at 23:36
  • @Eli Korvigo, yes, they are. But the discussion is about syntax -- not implementation. – Dmitry Rubanovich Jan 06 '18 at 23:37
  • 1
    @DmitryRubanovich we are talking apples and oranges here. I'm saying, that generators are useful in a very limited number of cases and can in no way be considered a substitution for general purpose stateful callables. You are telling me, how great they are, without contradicting my points. – Eli Korvigo Jan 06 '18 at 23:38
  • @DmitryRubanovich the `in` syntax in not in any way a more straightforward alternative for function call semantics than the `next` call. And, no, you can't get a value out of a generator using the `in` operator. I'll be pleased, if you show me how one can get a value out of a generator using the `in` operator. – Eli Korvigo Jan 06 '18 at 23:40
  • @DmitryRubanovich Since you've mentioned context managers from `contextlib`, they are actually closures (the `contextmanager` decorator creates a closure), wrapping a generator. – Eli Korvigo Jan 06 '18 at 23:43
  • 1
    @Eli Korvigo, and I am saying that callables are only generalizations of functions. Which themselves are syntactic sugar over processing of stacks. While generators are syntactic sugar over processing of queues. But it is this improvement in syntax that allows for more complicated constructs to be built up easily and with more clear syntax. '.next()' is almost never used, btw. – Dmitry Rubanovich Jan 06 '18 at 23:44
  • The `next(generator)` pattern is used throughout the built-in library, because it is the only way to get a single value our of a generator. – Eli Korvigo Jan 06 '18 at 23:45
  • @Eli Korvigo, I would say that the pythonic way to write code is to implement most highly-reusable "closures" as decorators. This is becoming an argument and it's getting out of hand. We both know what's going on and we are just arguing about what to call it. You are arguing that what something "really is" is what it is. I am saying that how things are exposed syntactically is what they are. I was ok arguing semantics of Python. But I don't want to have a proverbial semantic argument. – Dmitry Rubanovich Jan 06 '18 at 23:49
  • 1
    @DmitryRubanovich You've posted several highly misleading arguments, most imporatnly: 1) people use `in` to get values out of generators (please, show me an example), 2) context managers are implemented as generators (show me a single context manager, that is a generator) – Eli Korvigo Jan 06 '18 at 23:54
  • 1
    Regarding other statements from your answer "If you need a C++ style destructor (RIIA), you definitely do NOT want to use classes. You want context managers." – this is downright misinformation. What advantage a context manager has over a class instance in terms of memory management? – Eli Korvigo Jan 06 '18 at 23:57
  • "If you want to write "pythonic" code, you should prefer context managers and generators over classes. It will be cleaner." – try implementing a data structure using a generator and/or a context manager. I'll see, how far it will gets you. "Whenever you need to maintain a state of your functions and it cannot be accomplished with generators" – it is downright misleading, too. You can't express an arbitrary stateful function using a generator. Do you remember the `lru_cache` decorator? It is a stateful callable. Can you implement that with a generator? – Eli Korvigo Jan 06 '18 at 23:59
  • @Eli Korvigo, it's not misleading. It's the preferred way to write Python. RIIA is used for management of not only memory, but for any situations in which it is beneficial to release resources (locks, connections, file handlers, etc.) when the context ends. This allows to release resources when exiting from the context even when it's done in an unpredictable manner (usually because of an exception). – Dmitry Rubanovich Jan 07 '18 at 00:02
  • @DmitryRubanovich "It's the preferred way to write Python" preferred by you? Obviously, it is. Is it Pythonic to cram a generator instead of all closures and classes? Certainly not. – Eli Korvigo Jan 07 '18 at 00:04
  • @Eli Korvo, I did mention that you should use POD types for composite data structure types. I am going to stop replying now. You seem to be in a mood for heated discussion. I am not. – Dmitry Rubanovich Jan 07 '18 at 00:04
  • 1
    What's about using `in` to get a value out of a generator :) (be warned, the object must be a generator, not a closure or a class instance). Can you, at least, give me that? No, I'm most certainly not in a mood for "heated discussion". I only hate it when people write about things they seem to know very little about and are unwilling to revisit their views. – Eli Korvigo Jan 07 '18 at 00:08
  • This thread is 3 years old, but I wanted to add that you *can* use a generator as a general-purpose substitute for a stateful object, because generators can send *and receive* data via the `received_data = yield send_data` syntax. If the received data is a tuple of (method name, args) then you can simulate a method call by sending its "return value" at the next `yield`; this is the "message passing" conceptual interpretation of OOP made literal. I don't think anyone *should* do this (there are practically no upsides compared to writing a class), but it *is* possible in principle. – kaya3 May 23 '21 at 04:00
  • @kaya3 and OP - could you guys show us some example code - to understand better what you mean and how you guys use generator? – Gwang-Jin Kim Feb 02 '23 at 12:18
19

I think you do it right. Classes are reasonable when you need to simulate some business logic or difficult real-life processes with difficult relations. As example:

  • Several functions with share state
  • More than one copy of the same state variables
  • To extend the behavior of an existing functionality

I also suggest you to watch this classic video

valignatev
  • 6,020
  • 8
  • 37
  • 61
  • 3
    There is no need to use a class when a callback function needs a persistent state in Python. Using Python's yield instead of return makes a function re-entrant. – Dmitry Rubanovich Oct 12 '15 at 09:25
6

dantiston gives a great answer on why OOP can be useful. However, it is worth noting that OOP is not necessary a better choice most cases it is used. OOP has the advantage of combining data and methods together. In terms of application, I would say that use OOP only if all the functions/methods are dealing and only dealing with a particular set of data and nothing else.

Consider a functional programming refactoring of dentiston's example:

def dictMean( nums ):
    return sum(nums.values())/len(nums)
# It's good to include automatic tests for production code, to ensure that updates don't break old codes
assert( dictMean({'math':3.3,'science':3.5})==3.4 )

john = {'name':'John', 'age':12, 'gender':'male', 'level':6, 'grades':{'math':3.3}}

# setGrade
john['grades']['science']=3.5

# getGrade
print(john['grades']['math'])

# getGPA
print(dictMean(john['grades']))

At a first look, it seems like all the 3 methods exclusively deal with GPA, until you realize that Student.getGPA() can be generalized as a function to compute mean of a dict, and re-used on other problems, and the other 2 methods reinvent what dict can already do.

The functional implementation gains:

  1. Simplicity. No boilerplate class or selfs.
  2. Easily add automatic test code right after each function for easy maintenance.
  3. Easily split into several programs as your code scales.
  4. Reusability for purposes other than computing GPA.

The functional implementation loses:

  1. Typing in 'name', 'age', 'gender' in dict key each time is not very DRY (don't repeat yourself). It's possible to avoid that by changing dict to a list. Sure, a list is less clear than a dict, but this is a none issue if you include an automatic test code below anyway.

Issues this example doesn't cover:

  1. OOP inheritance can be supplanted by function callback.
  2. Calling an OOP class has to create an instance of it first. This can be boring when you don't have data in __init__(self).
zyc
  • 427
  • 1
  • 4
  • 12
4

A class defines a real world entity. If you are working on something that exists individually and has its own logic that is separate from others, you should create a class for it. For example, a class that encapsulates database connectivity.

If this not the case, no need to create class

Ashutosh
  • 4,371
  • 10
  • 59
  • 105
3

It depends on your idea and design. If you are a good designer, then OOPs will come out naturally in the form of various design patterns.

For simple script-level processing, OOPs can be overhead.

Simply consider the basic benefits of OOPs like reusability and extendability and make sure if they are needed or not.

OOPs make complex things simpler and simpler things complex.

Simply keep the things simple in either way using OOPs or not using OOPs. Whichever is simpler, use that.

aschultz
  • 1,658
  • 3
  • 20
  • 30
Mohit Thakur
  • 565
  • 5
  • 12