1

,I am currently creating a genetic fuzzy learning system and it's corresponding training simulation environment-> this collection of functions and classes is controlled by a master script where the user defines such things as: simulation scenario, controller characteristics, etc.

The result is about 50 different numpy arrays and lists as arguments for my particular problem. These arguments must be given to the controller generating functions(s), and to the simulation that determines the effectiveness of each controller. This process is currently using the multiprocess.starmap_async method to parallelize the fitness of each controller. So my master script calls the controller generation with about 50 arguments, the multiprocess calls it's workers with about 55 processes, and the workers call all the simulation files with 57 inputs. (it is my current understanding that using this many input arguments for the multiprocesses does not increase the overhead as they are just names pointing to data, rather than copying or re-initializing it... if I am wrong please let me know!)

I understand that I can replace my 50 arguments with one list that contains all of my arguments, and that I could utilize global variables in my master script to avoid having to do all of this book-keeping. A lot of these variables do not change, but are large data structures that I don't want to calculate more than once. Are there other approaches, and one that is considered to be most acceptable? I wish to avoid having 10 lines of args with every call of anything in my project.

Be as brutal as need be, everything runs perfectly, but my simulations are only going to get more and more complex, with the number of (non-optional) arguments growing. I've removed all of the more specific var names, but here is a call that my main script runs:

    (opt_str,opt_fit) = Trainer(map_size,Targets,SAMS,SAMS_stat,AIS,AIS_stat,
                          B_mpammo,B_sdammo,Route,vel,B_range,A_range, S_range,B_flight,
                          A_flight, S_flight,... lots more)

Inside my GA I have:

        step = np.int8(pop_size/8)
        pol = Pool(processes=8)

        res = pol.starmap_async(SimWorker, ((i, i+step, map_size,Targets,
                                SAMS,SAMS_stat,AIS,AIS_stat,
                                B_mpammo,B_sdammo,Route,vel,B_range,
                                A_range,S_range,B_flight,
                                A_flight, S_flight,fitness,
                                ttr,ttb,ttcr,ttcb,pos,times,pop,... lots more args) for i in range(0, pop_size, step)))

And SimWorker:

for p in range(start, stop):
    fitness[p] = Sim_T(map_size,Targets,SAMS,SAMS_stat,AIS,AIS_stat,
                            B_mpammo,B_sdammo,Route,vel,B_range,A_range,S_range,B_flight,
                            A_flight, S_flight,
                            ttr,ttb,ttcr,ttcb,pos,times,pop[p],... lots more)
  • It would be very helpful if you could provide some code snippets or diagrams. Usually a function with a large number of parameters is a sign that modularity in your design could be improved. – Mike Vella Jul 18 '13 at 12:43

2 Answers2

1

You should definitely encapsulate all these argument into an object, in the sense of Object oriented programming. Whether this object is a bare dictionary or a more advanced object is a design question which should be considered with care. We certainly need more info to give some more specific answer.

hivert
  • 10,579
  • 3
  • 31
  • 56
  • This is not a bad approach but in a way it is just shifting the problem to a class with too many parameters: http://stackoverflow.com/questions/5899185/class-with-too-many-parameters-better-design-strategy – Mike Vella Jul 18 '13 at 12:50
  • 2
    @MikeVella There's [a good comment on an answer there by S.Lott about why this can still be good](http://stackoverflow.com/questions/5899185/class-with-too-many-parameters-better-design-strategy#comment6789817_5899909) – Izkata Jul 18 '13 at 13:07
  • @Izkata I would tend to agree that it can still be good – Mike Vella Jul 18 '13 at 13:19
0

If you use keyword arguments, aka **kwargs then you can pass in only as many as you need, and not require the ones you don't need. Your function can then check and take the ones it wants.

Alternatively, create an object (I suggest a class) which represents state, and build that up and pass it into the function.

Joe
  • 46,419
  • 33
  • 155
  • 245
  • **kwargs has its design problems: http://ivory.idyll.org/blog/on-kwargs.html – Mike Vella Jul 18 '13 at 12:45
  • Coming from Matlab, where calling a function is all about position of the argument rather than the arguments name, do each of my objects have to utilize these arguments with the same name? Currently some of these functions take an argument, say map_info, and runs it in it's script as a temp_map_info, before returning some other data. – Dergs McGreggin Jul 18 '13 at 12:45
  • There are three kinds of arguments in a Python function. You should look at the documentation for examples. – Joe Jul 18 '13 at 14:06
  • @Joe I'm aware of the different kinds, though perhaps not knowledgeable enough about their use =) Every argument in these calls is necessary for each object, they will not run if any of these are missing I'd love to just define all of these in my master script, and have all of my functions, classes, etc. have access to them. I will do more research into "**"... if I can toss 50 arrays into **kwargs, and have all functions and classes be simply def function(**kwargs) that is exactly what I want. Or will i have to then call each item out of kwags inside the function before using it? – Dergs McGreggin Jul 18 '13 at 14:51
  • You can use `**kwargs` and `*args` in reverse, by passing in a list or dictionary, with `*` and `**` in front. Note that `args` and `kwargs` are only naming conventions, the important bits are the asterisks. – Joe Jul 18 '13 at 17:26
  • Thanks! That is serving basically what I had in mind. – Dergs McGreggin Jul 18 '13 at 20:18