116

I have a main class that has a ton of different functions in it. It's getting hard to manage. I'd like to be able to separate those functions into a separate file, but I'm finding it hard to come up with a good way to do so.

Here's what I've done so far:

File main.py

import separate

class MainClass(object):
    self.global_var_1 = ...
    self.global_var_2 = ...

    def func_1(self, x, y):
        ...
    def func_2(self, z):
        ...
    # tons of similar functions, and then the ones I moved out:

    def long_func_1(self, a, b):
        return separate.long_func_1(self, a, b)

File separate.py

def long_func_1(obj, a, b):
    if obj.global_var_1:
        ...
    obj.func_2(z)
    ...
    return ...
# Lots of other similar functions that use info from MainClass

I do this because if I do:

obj_1 = MainClass()

I want to be able to do:

obj_1.long_func_1(a, b)

instead of:

separate.long_func_1(obj_1, a, b)

I know this seems kind of nit-picky, but I want just about all of the code to start with obj_1., so there isn't confusion.

Is there a better solution that what I'm currently doing? The only issues that I have with my current setup are:

  1. I have to change arguments for both instances of the function
  2. It seems needlessly repetitive

I know this has been asked a couple of times, but I couldn't quite understand the previous answers and/or I don't think the solution quite represents what I'm shooting for. I'm still pretty new to Python, so I'm having a tough time figuring this out.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 13
    If you are new to Python, **just stick to the conventions** and keep all methods for a class in the same file. – Martijn Pieters Nov 29 '17 at 21:09
  • 5
    If you must group your methods into separate modules, use inheritance; create a base class in one module, import it and subclass it in the other. – Martijn Pieters Nov 29 '17 at 21:11
  • 4
    @MartijnPieters I know I could do that, but none of the functions within the class are finalized, so I find myself scrolling a lot to find the appropriate one, which takes more time than I'd like simply because there's so many. –  Nov 29 '17 at 21:11
  • 4
    That's not a problem to be solved by changing the code; that's a problem to be solved by using [an IDE](https://wiki.python.org/moin/IntegratedDevelopmentEnvironments) which allows you to jump to the location of a function. (Or use your text editor's "find" functionality.) – David Z Nov 29 '17 at 21:31
  • 3
    If a file is not enough for all the methods, then likely you have a problem with the design. The class is too `heavy` and probably splitting it into two or three classes (and files) is the solution. – trinchet Nov 29 '17 at 21:33
  • 1
    Ya, you have to ask your self why you have a class with so may big methofs. – Robert Moskal Aug 19 '19 at 18:45
  • I split a big class into core class and library module that takes class as argument. It works it is approved pattern, but it does no look very natural at the call points, I would rather have a big file. How bad is 2k lines in one .py file? It is easy to break this limit with explicit module names, type annotations and low limit on line length. – uuu777 Oct 14 '20 at 16:12

4 Answers4

165

Here is how I do it:

  1. Class (or group of) is actually a full module. You don't have to do it this way, but if you're splitting a class on multiple files I think this is 'cleanest' (opinion).

  2. The definition is in __init__.py, methods are split into files by a meaningful grouping.

  3. A method file is just a regular Python file with functions, except you can't forget 'self' as a first argument. You can have auxiliary methods here, both taking self and not.

  4. Methods are imported directly into the class definition.

Suppose my class is some fitting GUI (this is actually what I did this for first time). So my file hierarchy may look something like

mymodule/
     __init__.py
     _plotstuff.py
     _fitstuff.py
     _datastuff.py

So plot stuff will have plotting methods, fit stuff contains fitting methods, and data stuff contains methods for loading and handling of data - you get the point. By convention I mark the files with a _ to indicate these really aren't meant to be imported directly anywhere outside the module. So _plotsuff.py for example may look like:

def plot(self,x,y):
     #body
def clear(self):
     #body

etc. Now the important thing is file __init__.py:

class Fitter(object):
     def __init__(self,whatever):
         self.field1 = 0
         self.field2 = whatever

     # Imported methods
     from ._plotstuff import plot, clear
     from ._fitstuff  import fit
     from ._datastuff import load

     # static methods need to be set
     from ._static_example import something
     something = staticmethod(something)

     # Some more small functions
     def printHi(self):
         print("Hello world")

Tom Sawyer mentions PEP-8 recommends putting all imports at the top, so you may wish to put them before __init__, but I prefer it this way. I have to say, my Flake8 checker does not complain, so likely this is PEP-8 compliant.

Note the from ... import ... is particularly useful to hide some 'helper' functions to your methods you don't want accessible through objects of the class. I usually also place the custom exceptions for the class in the different files, but import them directly so they can be accessed as Fitter.myexception.

If this module is in your path then you can access your class with

from mymodule import Fitter
f = Fitter()
f.load('somefile') # Imported method
f.plot()           # Imported method

It is not completely intuitive, but not too difficult either. The short version for your specific problem was you were close - just move the import into the class, and use

from separate import long_func_1

and don't forget your self!

How to use super addendum

super() is a useful nifty function allowing parent method access in a simple and readable manner from the child object. These kind of classes are big to begin with, so inheritance not always make sense, but if it does come up:

  1. For methods defined in the class itself, within __init__.py, you can use super() normally, as is.

  2. If you define you method in another module (which is kind of the point here), you can't use super as is since the function is not defined in the context of your cell, and will fail. The way to handle this is to use the self argument, and add the context yourself:

    def print_super(self):
      print('Super is:', super(type(self), self))
    

    Note you cannot omit the second argument, since out of context super does not bind the object method (which you usually want for calls like super(...).__init__()).

  3. If this is something you want to do in many methods in different modules, you may want to provide a super method in the __init__.py file for use:

    def MySuper(self):
        return super()
    

usable by self in all methods.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • Are the private imports occur in `__init__` or outside of it? (Hint: check your indentation...) – cowbert Nov 29 '17 at 21:54
  • 1
    @cowbert Outside, see tabbing. I'll add some example code to make it clear. You only need it to compile once with the class, not every object. – kabanus Nov 29 '17 at 21:55
  • 3
    I'd like to add as a general comment that sometimes it doesn't make sense to split a class into sub-classes, and refactoring long class code is something you may run into even in Python. – kabanus Nov 29 '17 at 22:02
  • 1
    Splitting a "class" into multiple files is also common ECMAScript pattern (using the `var PseudoClass = PseudoClass || {}` idiom), so if you are doing some fullstack development with Python middleware, it might make sense to split a Python class for various reasons. – cowbert Nov 29 '17 at 22:44
  • 1
    using import in middle of the file is bad practice – TomSawyer Apr 12 '19 at 19:11
  • @TomSawyer Is that so? I never heard or had reason to believe what I did here is problematic. Can you explain what are the consequences? – kabanus Apr 12 '19 at 19:14
  • @kabanus https://stackoverflow.com/questions/1188640/good-or-bad-practice-in-python-import-in-the-middle-of-a-file – TomSawyer Apr 13 '19 at 02:41
  • @TomSawyer Thanks, I am familiar with the guidelines and the question, I thought you had something else in mind. My only 2 counterpoints are: 1. I usually put all in-class imports below init, which is not really "middle", and you could put them before - which would be top. This happens often when there is an initial non-import line, such as conditional imports. Matter of preference and strictness I suppose. 2. This use-case is unique. The imports are not of some library, but of the defined class itself and by definition should be scoped. I have my own rule of thumb - it's good practice - – kabanus Apr 13 '19 at 08:56
  • - to consider applying guidelines on a case by case basis. This is of course, completely a matter of opinion. You are welcome to add your alternative solution, it would be good to have as many different useful answers as possible. – kabanus Apr 13 '19 at 08:57
  • Is there a way to put the __init__() in separate script just like the methods? I am keen to 'plug and play' different initializations while maintaining everything else – Tian Oct 09 '19 at 06:18
  • 1
    @Tian Yup, same as any method. Make a file, and import it from there (`from .init1 import __init__`, where init1.py contains `def __init__(self...):...`) etc. Next time I syggest trying out these things yourself (probably faster). – kabanus Oct 09 '19 at 08:29
  • A way to create static methods is by calling `plot = staticmethod(plot)` after the import. You could also automate this process by writing a metaclass... – Nearoo Feb 21 '20 at 07:40
  • @Nearoo Thanks, I added that. – kabanus Feb 21 '20 at 11:21
  • This may create `circular (or cyclic) imports` issue if there is `if name == __main__ ` on the imported modue which tests itself – alper May 05 '20 at 11:12
  • @alper I'm not sure I understand what you say. Which module tests for `__name__ == __main__` (I'm guessing you had a typo?)? The modules imported into the class are by the way I defined above not supposed to be imported standalone, as they are an integral part of the class. That would be like importing methods from a class without a class. As such they really shouldn't have that `if` line at all. Regardless, it is also not clear to me how that `if` statement can create a circular import? – kabanus May 05 '20 at 11:24
  • For example in `_plotstuff.py` file, lets say I write following for testing: `f = Fitter(); f.plot() #Imported method` under a `if __name__ == "__main__":`. When I add `from mymodule import Fitter` to top of the file `circular import` error shows up. So I had to carry `from mymodule import Fitter`into the `if __name__ == __main__:` – alper May 05 '20 at 11:31
  • @alper I see - that would be a complete anti-pattern to this. `_plotstuff` cannot import `Fitter` by definition. In fact, it should never (the way this design is imagined) be invoked directly. That is like a method in your class trying to import your parsed class into an inner method (pre-parsing), after trying to invoke it without parsing the class - which does not make sense to me. To me it seems the circularity is made by hand here - a bad design which causes this is not related specifically to whether we are splitting a class or just have two modules which have a circular dependence. – kabanus May 05 '20 at 11:38
  • 1
    @alper The only place I can imagine something remotely close needed is if you have in a method in `_plotstuff` something like `Fitter.method(...)`. I think that is the only thing that is truly missing compared to a one-file class, but of course, you can always use `self` instead, which my opinion is better design (though up for debate I guess). – kabanus May 05 '20 at 11:40
  • I was converting my code into your explained pattern but like all the files like `_plotstuff.py` have their `if __name__ == __main__: ` to test themselfs. Basically when I call the `_plotstuff.py` it was testing `plot()` function. Along with this design I will carry my testing into completely different file, which would be a better design i guess – alper May 05 '20 at 11:45
  • 1
    @alper I'm afraid that won't work for the above reasons. You have to test the class as a whole, and as such test `plot` only as part of a complete object, just like you would any class. I think your conclusion is correct - testing should be separated or at least only exist in the `__init__` file, where the class is actually made. That would fit better with the whole concept – kabanus May 05 '20 at 11:48
  • Is there a way to import necessary third-party modules only inside the top-level file (__init__.py) and not inside every single file of this class (_plotstuff.py, _fitstuff.py, _datastuff.py) ? I have many repetitive imports. – tevang Jul 05 '20 at 11:03
  • @tevang If you have a dependency that is relevant to every implementation file (as in you call it from there), you will have to import it there. That is the dependency structure, the implementations do not see the `__init__` file at all. This also conforms to how Python works. Perhaps if you have a specific use-case it has a better solution? What do you need to import everywhere? – kabanus Jul 05 '20 at 12:22
  • @kabanus in the simplest scenario "import copy", it the most complicated other necessary functions from my code. When you have all functions within a class placed within a single file, you import the modules only once at the top of the file. However, when you split the functions into separate files you have to import the same modules in every file individually, is this right? – tevang Jul 05 '20 at 12:28
  • @tevang That's still to general. The short answer to your question is yes, you must import individually. What I meant is it sometimes it makes sense to save these methods withing the object itself. For example, If you are `deepcopy`ing fields a lot, then it might make sense for the fields to have their own copy method. `numpy` arrays for example have their own copy methods (or function arguments). Another option is for the object itself to have a copy method (so you can use `self.my_copy(...)`, but again, this is very dependent on your exact use case whether this makes sense. – kabanus Jul 05 '20 at 12:50
  • @tevang This is too much for comments though, if you want you can post a question with specific examples of which libraries you need everywhere, so it will be clear if there are better solutions (within the split class framework) to your specific example. – kabanus Jul 05 '20 at 12:52
  • There is an error in the exampe: when calling Fitter.__init__ it expects a parameter – Amir Katz Sep 23 '20 at 09:53
  • 1
    Excellent work! Here lies the difference between responsible choices and bureaucracy. (The otherwise suggested) subclassing is intended to show inheritance, not as a workaround to group parts of the same entity. Sometimes even mixins don't fit well for this. – dawid Oct 22 '20 at 23:25
  • 2
    Additionally, if the tools (linter, IDE etc.) complain or cannot handle our sensible choices, the problem is with the tools. Software architecture should be served by the tools not the opposite. BTW, the Python built-in libs themselves are plenty of examples of better choices, including name convention, than that *recommended* by PEP8 (which is intended for the language development, not as a sacred book for everything else). – dawid Oct 22 '20 at 23:32
  • 1
    No PEPper could predict all scenarios/hardware/software advances (very far from that, they were mostly worried with the language development itself), and no rule is perfect, and they knew that, as it is clearly stated as guidelines. – dawid Oct 22 '20 at 23:37
  • How do you handle `super()` calls, as they fail with `Zero-argument form of "super" call is valid only within a class - Pylance` ? – Dirk Schiller Feb 14 '23 at 16:54
  • @DirkSchiller Interesting question. Since this kind of class is big to begin with, I don't think it came up often. There are twoish options - (1) Have the method defined in the class itself, not a separate module (i.e., __init__.py). Whether this makes sense depends on your design. (2) Use the explicit super instantiation, `super(type(MyObject), MyObject)`. (3) Have super method of the class, doing (1) but returning the super so you can use it in other modules. – kabanus Feb 15 '23 at 06:12
  • It is not possible to import the `MyObject` class into the method file as it fails with `ImportError: cannot import name 'GraphicsLine' from partially initialized module 'QALibs.Recorder.GraphicsLine' (most likely due to a circular import)`. I solved it by your (1) option and implemented the method in the class only with the most needed stuff like `def hoverLeaveEvent(self, event): [LINEBREAK] self._hoverLeaveEvent(event) [LINEBREAK] return super().hoverLeaveEvent(event)` and before that code I import the method file via `from .hover_leave_event import _hoverLeaveEvent`. – Dirk Schiller Feb 16 '23 at 09:35
  • And the method file itself then, only contains the code needed without `super()`. This solution is partly ok as with nested `super()` returns it may get complicated or class members need to be used to store the return value which can conflict with many methods, handling `super()` return values. – Dirk Schiller Feb 16 '23 at 09:41
  • And thank you for your answer it is super helpful and works very well with big projects! – Dirk Schiller Feb 16 '23 at 09:43
  • @DirkSchiller Regarding "It is not possible to import..." _ I did not write clearly enough, I meant when you define a method in another file (so something like `def func(self, args...)`, you can use the `self` argument to do that - `super(type(self), self)`. You don't have to import anything for that, and can't as you note. – kabanus Feb 16 '23 at 10:33
  • Another drawback is, that Autocompletion is not available any longer. For example accessing properties from the `__init__` method in the `__init__.py` file doesn't work - at least in VSCode. Also reimporting methods and classes leads again to circular import issues. – Dirk Schiller Mar 13 '23 at 11:33
  • And in case a pythonfile is importable, just the module docstring will be displayed but not the method docstring. – Dirk Schiller Mar 13 '23 at 11:42
22

I use the approach I found here. It shows many different approaches, but if you scroll down to the end, the preferred method is to basically go the opposite direction of @Martin Pieter's suggestion which is have a base class that inherits other classes with your methods in those classes.

So the folder structure is something like:

_DataStore/
    __init__.py
    DataStore.py
    _DataStore.py

So your base class would be:

File DataStore.py

import _DataStore

class DataStore(_DataStore.Mixin): # Could inherit many more mixins

    def __init__(self):
        self._a = 1
        self._b = 2
        self._c = 3

    def small_method(self):
        return self._a

Then your Mixin class:

File _DataStore.py

class Mixin:

    def big_method(self):
        return self._b

    def huge_method(self):
        return self._c

Your separate methods would be located in other appropriately named files, and in this example it is just _DataStore.

I am interested to hear what others think about this approach. I showed it to someone at work and they were scared by it, but it seemed to be a clean and easy way to separate a class into multiple files.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jeff Tilton
  • 1,256
  • 1
  • 14
  • 28
  • 2
    I think it's a valid approach. You might also raise an exception in `__init__` from the Mixin class to discourage users to instantiate the Mixin class. – mjspier Nov 18 '19 at 10:32
  • 1
    I have used mixins, probably too much. It is an easy way to keep file sizes down but lets class sizes be any size. The problem is you aren't refactoring the way you do with inheritance, rather, just making a mega-class. Whether using mixins or monkey patching, the downfall is your linter can't process it, so you have to find errors the hard way. For this one reason, subclassing is preferred, not to mention better encapsulation, et al. – Wyrmwood Aug 27 '20 at 22:58
  • 1
    This approach follows KISS (keep-it-simple-stupid) pattern, so for non-python programmers would be easier to understand what is going on. "working" does not always mean "maintainable". – xmantas Sep 25 '22 at 14:45
  • 1
    I'm using PyCharm, and it can't find any variables set in `__init__` on the main class. The way I get around this is to add a type references to the top of the class with all the variables needed. Eg add `company: Company` in the class scope. – Dolan Sep 29 '22 at 00:11
  • 1
    @Dolan I wish I knew how precisely to add a so called type reference for that, but if I just add such type hints in global scope right after the class definition, PyCharm winds down its error indications. I'm just not sure what those type hints mean to the python interpreter. – matanster Apr 25 '23 at 14:57
  • This works great and I'd just suggest naming the mixin class more semantically if it's adding a specific capability to the class – matanster Apr 25 '23 at 20:28
7

Here is an implementation of Martijn Pieters's comment to use subclasses:

File main.py

from separate import BaseClass

class MainClass(BaseClass):
    def long_func_1(self, a, b):
        if self.global_var_1:
            ...
        self.func_2(z)
        ...
        return ...
    # Lots of other similar functions that use info from BaseClass

File separate.py

class BaseClass(object):

    # You almost always want to initialize instance variables in the `__init__` method.
    def __init__(self):
        self.global_var_1 = ...
        self.global_var_2 = ...

    def func_1(self, x, y):
        ...
    def func_2(self, z):
        ...
    # tons of similar functions, and then the ones I moved out:
    #
    # Why are there "tons" of _similar_ functions?
    # Remember that functions can be defined to take a
    # variable number of/optional arguments, lists/tuples
    # as arguments, dicts as arguments, etc.

from main import MainClass
m = MainClass()
m.func_1(1, 2)
....
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cowbert
  • 3,212
  • 2
  • 25
  • 34
0

I find it easier to work with multiple inheritance of Separate classes because it allows me to access all methods and variables from the Main class without the need to redefine them.

Each Separate class can be defined in a separate file, and will be designed to handle a specific purpose related to the Main class, rather than being applicable to other Separate classes.

There are a few tricks to handle typing properly and to prevent issues with circular imports. It is also important to ensure distinct names for methods and variables defined in the Main and Separate classes, this prevents any conflicts since they are all inherited by Main.

This is how it works:

File shared.py

# The following classes can be imported by both main and separate files    
class Shared1: ...
class Shared2: ...

File main.py

from shared import Shared1, Shared2
from _separate_1 import Separate1
from _separate_2 import Separate2

# Main will inherit all Separate1 and Separate2 methods
class Main(Separate1, Separate2):
    def __init__(self):
        # Note that using super().__init__() with multiple inheritance 
        # will only call the constructor of the first parent class. With
        # multiple inheritance we need to explicitly call the constructors 
        # of each parent class using their respective class names
        Separate1.__init__(self)
        Separate2.__init__(self)
        # global variables can also be used in the separate files
        self.global_var_1: Shared1 = ...
        self.global_var_2: Shared2 = ...
        # the following variables will be used only in this file
        self.var_1 = ...
        self.var_2 = ...
        ...
    
    # The following methods will only be used in this file, but they can
    # call any other methods or variables inherited from a Separate class
    def func_1(self, ...):
        ...
    def func_2(self, ...):
        ...

File _separate_1.py

from shared import Shared1, Shared2

class Separate1:
    # We can use type hints to refer to the global variables
    global_var_1: Shared1
    global_var_2: Shared2

    def __init__(self):
        # Variables defined here are mainly used in this file 
        self.var_3 = ...
        ...
    # The following functions will be inherited by the Main class
    def long_func_1(self, ...): ...
    def long_func_2(self, ...): ...

File _separate_2.py

from shared import Shared2

class Separate2:
    # global variables can be used by any Separate classes
    global_var_2: Shared2

    def __init__(self):
        # use distinct names for variables
        self.var_4 = ...

    # use distinct names for functions
    def long_func_3(self, ...): ...
    def long_func_4(self, ...): ...

And so on, hope it helps.

cubitico
  • 21
  • 4