Porting an old fortran program to work with python+numpy

Question

I am supposed to be doing research with this huge Fortran 77 program (which I recently ported to Fortran 90 superficially). It is a very old piece of software used for modeling using finite element methods.

It is a monstrosity. It is roughly 240,000 lines.
Since it began its life in Fortran 77, it uses some really dirty hacks for dynamic memory allocation; basically it uses the functions from the C standard library, mixed programming with C and Fortran. I am yet to fully grasp how allocation works. The program is built to be easily extendable by the user, and the user generally needs to allocate some globally accessible arrays for later use. This is done by having an array of memory addresses, which point to the beginning addresses of dynamically allocable arrays. Of course, which element of the address array pointing to which information all depends on conventions which has to be learned by the user, before one can start to really program. There are two address arrays, one for integers, and the other for floating points.
By dirty hacks, I mean inconsistent ones. For example an update in the optimization algorithm of the GNU compilers caused the program to exit with random memory leaks.
The program is far from elegant. Global variable names are generally short (3-4 characters) and cryptic. Passing data across routines is of course accomplished by using common blocks, which include all program switches, and the aforementioned arrays.
The usage of the program is roughly like that of an interactive shell, albeit a stupid one. First, an input file is read by the program itself, then per choice, the user is dropped into a pseudo-shell, in which the user has to type 4 character wide commands, followed by the parameters. The parser then parses the command, and corresponding subroutine is called with the parameters. You would guess that there is a loop structure in this pseudo-parser (a goto bonanza, rather) which wraps the subroutine behavior in a manner more complex than it should be in the 21st century.
The format of the input file is the same (commands, then parameters), since it is the same parser. But the syntax is not really consistent (by that, I mean it lacks control structures, and some commands cause the finite state machine to do behavior that contradict with other commands; it lacks definite grammar), time to time causing the end user to discover pitfalls. The user must learn these pitfalls by experience; I did not see them in any documentation of the program. This is a problem that can easily be avoided with python, and it is not even necessary to implement a parser.

What I want to do:

Port parts of the program into python, namely the parts that don't have anything to do with numerical computation. This includes
- cleaning up and abstracting the API with an OOP approach in python,
- giving meaningful variable names,
- migrating dynamic allocation to either numpy or Fortran 90 and losing the C part,
- migrating non-numerical execution to python, and wrap the numerical objects using f2py, so there is no loss in performance. Have I told that the program is damn fast in its current state? Hopefully porting the calls to numerical subroutines and I/O to python will not slow it down to an impractical level (or will it?).
- Making use of python's interactive shell as a replacement for the pseudo-shell. This way, there will not be any inconsistencies for the end user. The aforementioned commands will be simply replaced by functions defined in python. This will allow the user to actually access the data. Plus, the user will be able to extend the program without going to deep.

What I wonder:

Is f2py suitable and up-to this task of wrapping numerous subroutines and common blocks without any confusion? I have only seen single-file examples on the net for f2py; I know that numpy has used it to wrap LAPACK and stuff, but I need reassurance that f2py is a tool consistent enough for this task.
Whether there are any suggestions on the general strategy that I should follow, or pitfalls I should avoid.
How can & should I implement a system in this python-wrapped Fortran 90 environment, so that I will be able to modify (allocate and assign) globally accessible arrays and variables inside fortran routines. This should preferably omit address arrays and I should preferably be able to inject verbal representations into the namespaces. These variables should preferably be accessible inside both python and fortran.

Notes:

I may have been asking for too much, something beyond the boundaries of the possible realm. In this case, please forgive me for I am a beginner with this aspect of programming; and don't hesitate to correct me.
The "program" I have been talking about is open source but it is commercial and the license does not allow its distribution, so I decided not to mention its name. However, you could deduce it from the 2nd sentence and the description I gave throughout.

This sounds to me like a HUGE task based on the scope of the changes and the lines of code: many months or more? I normally recommend revising code by making small changes at a time and regression testing. Some of the improvements, such as memory allocation, may be hard to do in small steps, and thus risk that you break the code when you make the change, but have difficulty finding a new bug because the change is so big. Are the improvements worth the work to you? — M. S. B., May 09 '14 at 23:18
@M.S.B. Indeed. I intend to undertake it for self improvement primarily, and I'm not concerned with achieving 100% success right now. I will have to use this software for 2 years or more, so I think it is worth trying. First, I must know exactly what the problem is, and define a strategy. So at this point, I am concerned with the tools I am going to use, namely numpy and f2py. — osolmaz, May 09 '14 at 23:44
*'The "program" I have been talking about is open source but it is commercial and the license does not allow its distribution'* - eh? It's either open source or it isn't. If it's released under an open source license then by definition the source code must be redistributable, including any subsequent modifications even if they were made by a commercial entity. — ali_m, May 10 '14 at 18:39
I don't see why you can't mention the name of the program... — steabert, May 11 '14 at 10:32
Well for starters, I could not find the license because I wasn't the one who acquired the software from the seller, so I don't know the terms for what I am able to do. I could now host the source code on a website, but I am academically obliged not to do so, and people I know would not be very happy with it. It would take only one person to upload it to the internet. Nevertheless the program is not very popular and is not for the layman, I think that is the reason it hasn't been pirated yet. Here is a link to the program: http://www.ce.berkeley.edu/projects/feap/ — osolmaz, May 11 '14 at 12:48
There is a free and downloadable version with less features. It basically has the same structure, but lacks some of the features that make it popular in the field. Here is the link to that one: http://www.ce.berkeley.edu/projects/feap/feappv/ If you peek at some parts of the code (allocation parts for example), what I described might become more clear. — osolmaz, May 11 '14 at 12:50
OK, this thing is pretty unequivocally *not* open source. Which is a shame, since what you're proposing a) sounds like it would be very useful for the engineering/scientific community and b) sounds like a huge amount of work for one person. — ali_m, May 11 '14 at 12:58
I agree totally, and have a good feeling that if I achieve this and make it more user friendly, they would be more cooperative in making it free, since what is better than free? Sometimes people just need a little bit of persuasion or perspective, since I doubt that the license fee is their only source of income. — osolmaz, May 11 '14 at 13:06
I would be very wary of investing lots of time in this endeavour in the hope that the copyright holders will eventually see the light and make it open source. At the end of the day they are under absolutely no obligation to allow you to redistribute any of the changes you make, so all of your hard work may never be seen by anyone other than yourself, or alternatively they could just quietly incorporate your changes into their proprietary code base without crediting you. — ali_m, May 11 '14 at 13:26
Yes, that would be a problem. Nonetheless all my technical experience were gained through similar occasions. My aim is primarily gaining experience. And I would be able to distribute it in my close circles (for research) no matter what. — osolmaz, May 11 '14 at 14:34

score 3 · Answer 1 · answered Jun 10 '14 at 16:47

I'm doing something depressingly similar. Instead of dynamic memory allocation via C we have a single global array with integer indices (also at global scope), but otherwise it's much the same. Weird, inconsistent input file and all.

I'd advise against trying to rewrite the majority of the program, whether in python or anything else. It's time consuming, unpleasant and largely unnecessary. As an alternative, get the F77 code base to the point whether it compiles cleanly enough that you're willing to trust it, then write an interface routine.

I now have a big, ugly F77 code base which sits behind an interface. The program requires input as a text file so a large part of the interface's job is to produce that text file. Beyond that, the legacy code is reduced to a single gateway routine which takes a few arguments (including a means of identifying the text file) and returns the answer. If you use the iso_c_binding of Fortran 2003 you can expose the interface in a format C understands, at which point you can link it to whatever you wish.

As far as the modern code (mostly optimisation routines) is concerned, the legacy code base is the single subroutine behind the C interface. This is much nicer than trying to modify the old code further and probably a valid strategy for your case as well.

I think I get the idea, but unfortunately thats not an option for me. I need a more substantial change for future extensibility. — osolmaz, Jun 11 '14 at 08:26

score 2 · Answer 2 · edited May 23 '17 at 12:30

For an example how to generate the f2py interface library using multiple fortran files see this post.

f2py might be suitable for your task, but there are some pitfalls that might cause some problems. Some pitfalls concerning f2py are listed here and summarized below:

Concerning your specific problem you might run into problems with your allocatable arrays, because f2py was writen for Fortran77 and does not support many of the Fortran90+ features (such as allocatable arrays).
I also encountered a problem with an undocumented maximum array size (arround 400 x 200 x 20 x 20). If I used arrays bigger then that f2py would not be able to generate the python library. Especially the large matrices being passed arround in finitie element codes might be too big for interfacing. Therefore you would not have access to those in the Python part of the program.
Beneficial for you is that f2py should have no Problems with COMMON Blocks, etc. because it was especially written for Fortran77.
After passing the data through the interface to the Fortran routines, there should be no (or only minimal) slowdown if you do it right. The key is to minimize calculations in the Python part of the program per run. This includes the manipulation of the data arrays (shift, rotate, copy, etc.) but not passing of them (because the interface is pass-by-reference).

As an alternative you should have a look at Cython (also see the Link above and the linked working example therein). I think this might serve you better in the long run.

Implementation Suggestion

This suggestion is how I would do it incorporating my experiences with having done something similar (see Background below). It should largely be independent of how you interface the Python and Fortran code (f2py, Cython, ...).

Of course you should be very careful to not change the behaviour and therefore possibly the results of the program. Therefore generation of some tests and their corresponding reference in- & output files and test documentation including all steps, keystrokes, commands, etc. necessary to reproduce those results should be your first step.

In your case I would try to change the least amount possible of the Fortran program. I would try to wedge the "pseudo-shell" from the Fortran code, e.g. making it its own module, and build an interface to that module. Like that you can use all of the original Fortran code and the modifications, bugfixes and updates from your peers, even in the future. The key is to not distance your code to far from the original/ mainstream because in scientific communities usually not everybody will agree with major changes to the source code and update their workflow or source code accordingly. Therefore future work from your peers might not be made in your version, but in the original source code and it would be your own responsibility to merge those changes into your version, which gets easier the less you change.

Using that interface you can work on your python shell and maybe even build a GUI for it without having to worry about changing anything in the original progam. This reduces the risk to introduce bugs or change the results of the original. Your Shell/ GUI would therefore work as a wrapper around the original program to simplify the workflow and remove inconsistencies. All the "intelligence" and utilities, like error & cross checking of the user-input, help pages, tutorials/ howto, etc. would be implemented in the Python wrapper, which would parse these inputs, translate them to the corresponding commands for your Fortran program, send them and wait for the results.

After you have simplified the usage of the program I would write some automatisation for the tests (setup + evaluation) to complete your utilities suite. Like that even somebody new to the program would be able to make changes to the code without having to worry about unknowingly changing the results. This should enable your tools to benefit the community which will attract new users and therefore encourage further development within the community.

Only as the last step I would replace the parts of the code using C with Fortran90+ methods to simplify the code. This is an extensive change of the codebase and needs a lot of tests to ensure EVERY possible combination of commands is checked and verified before and after the changes.

This method also has the benefit, that you could possibly make your interface/ GUI open source (you have to check the licence of your program of course) as long as it is seperable from the source code of the Fortran program. The Fortran - Python interface would have to be provided, or installed/ generated from source files when your interface is loaded using some simple build skript as seen in the first link of this post.

For the manipulation of internal data I would write a seperate wrapper routine, that only handles the data interface. This should be done in Cython though to enable you to use allocatable arrays, etc. Because this interface would work with "pass-by-reference" you should be able to use the full collection of Python (numpy) tools to manipulate the arrays and data.

Background

I did something similar using our research code for helicopter rotordynamics. This is also a very old and large program written in Fortran77 (e.g. goto bonanza). The newer additions and modifications to the code are usually done in Fortran90/2003.

Using parts of this code (several subroutines & module files) I generated a python library to connect our GUI (Python & Qt) to the Fortran program; mainly for postprocessing of Fortran binary output files.

Porting an old fortran program to work with python+numpy

2 Answers2

Implementation Suggestion

Background