1

I'm writing an application for scientific data analysis and I'm wondering what's the best way to structure the code to avoid (or address) the circular import problem. Currently I'm using a mix of OO and procedural programming.

Other questions address this issue but in a more abstract way. Here I'm looking for a solution that is optimal in a more specific context.

I have a class Container defined in DataLib.py whose data consist in lists and/or arrays. With all methods and supporting functions DataLib.py is quite large (~1000 lines).

I have a second module SelectionLib.py (~400 lines) that contains only functions to "filter" the data in Container according to different criteria. These functions return new Container objects (with filtered data) and thus SelectionLib.py needs to import Container from DataLib.py. Note that, logically, these functions are "methods" for "Container", they are just implemented using python functions.

Now, I want to add some high level method to Container so that a complex analysis can be performed with a single function of method call. And by "complex analysis" I mean an arbitrary number of Container methods call, local function (defined in DataLib.py) and filter functions (defined inSelectionLib.py).

So the problem is that DataLib.py needs to import SelectionLib.py to use the filter functions, but SelectionLib.py already imports DataLib.py.

Right know my hackish solution is to run the two files with run -i ... from IPython so it is like having a big single file and I avoid the circular import. But at the same time this scripts are difficult to integrate for example in a GUI.

How do you suggest to solve this problem:

  1. use pure OO and inheritance and split the object in 3: CoreContainer -> SelectionContainer -> HighLevelContainer

  2. Restructuring the code (everything in one file?)

  3. Some sort of Import trickery (put imports at the end)

Any feedback is appreciated!

Community
  • 1
  • 1
user2304916
  • 7,882
  • 5
  • 39
  • 53
  • 1
    Can't you avoid putting the high-level method on `Container` and let it be in a function on its own?(implemented in something like `data_tools.py` which would contain the high-level API). It would make your API more consistent(i.e. `Container` class provides "low-level" API, and other high-level API is implemented in functions) and it would avoid circular-imports. – Bakuriu Jul 14 '13 at 20:59
  • Yes, this is a solution. I'm not sure the API would be be more consistent as "user" would still use both high-level and low-level class methods. And from the outside, without looking at the implementation, is not clear if a method is high- or low- level. And probably, just for execution, doesn't matter either. It's only when the user wants to read the implementation of a method that should look in two different files and see where it's defined. – user2304916 Jul 14 '13 at 21:16
  • I'm not sure I understand the design. Having a single container instance select from a list of instances and perform group operations instances seems a bit wrong. Looks like a `Container` class should provide public methods, then a `Filter` class (selectionlib.py) should pick a list of the required Containers so an `Algorithm` class can manipulate the Containers through their public methods. (i.e. option 1! :) ) – will-hart Jul 14 '13 at 21:18
  • Thanks for the comment. However you probably misunderstand the implementation. `Container` contains already several lists and arrays representing the experimental data of an entire measurement. Would be inefficient to split every bit of information in a different object and using collections of objects as you suggested. Container has some public methods but many operations are performed directly accessing the attributes. For example if .time is a numpy array the user will use its methods. In other words `Container` is not an opaque object and known attributes/objects are part of the API. – user2304916 Jul 14 '13 at 21:36
  • If functions in `SelectionLib` are, as you say, "methods" for `Container`, it seems reasonable that `DataLib` imports `SelectionLib`, not the other way around. – ev-br Jul 16 '13 at 11:24
  • @Zhenya, this is a very good suggestion. Due to the python duck-typing should work seamlessly even if functions in `SelectionLib` access methods and attributed of `Container`. Thanks. – user2304916 Jul 16 '13 at 18:06

1 Answers1

2

If functions in SelectionLib are, as you say, "methods" for Container, it seems reasonable that DataLib imports SelectionLib, not the other way around.

Then the user code would just import DataLib. This would require some refactoring. One possibility to minimize the disruption to the user code would be to rename your existing DataLib and SelectionLib to _DataLib and _SelectionLib, and have a new DataLib to import the necessary bits from either (or both).

As an aside, it's better to follow the PEP-8 conventions and name your modules in lowercase_with_underscores.

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • I did the change in the import order and no other change was needed. The code just worked! That's due to the python duck-typing: even though `SelectionLib` uses methods and attributes of the `Container` object it doesn't need to import the `Container` class. – user2304916 Jul 17 '13 at 22:23