14

I am faced with the task of building a new component to be integrated into a large existing C codebase. The component is essentially a kind of compiler, and will be complicated enough that I would like to write it in OCaml (for reasons along the lines of those given here). I know that OCaml-C interaction is possible (as per the manual and this tutorial), but it looks somewhat painful.

What I'd like to know is whether others here have attempted large-scale integration of OCaml and C code, what were some of the unexpected gotchas they found, and whether at the end of the day they concluded that they would have been better off just writing the new code in C.

Note, I'm not trying to start a debate about the merits of functional versus imperative programming: let's just say we assume that OCaml happens to be the right tool for the job I have in mind, and the potential difficulty in integration is the only issue. I also don't have the option of rewriting the rest of the codebase.

To give a little more detail about the task: the component I need to implement is a certain kind of query optimizer that incorporates some research ideas my group at UC Davis is working on, and will be integrated into PostgreSQL so that we can run experiments. (A query optimizer is, essentially, a compiler.) The component would be invoked from C code, would function mostly independently but would make a certain number of calls to other PostgreSQL components to retrieve things like system catalog information, and would construct a complex C data structure (representing a physical query plan) as output.

Apologies for the somewhat open-ended question, but I'm hoping the community might be able to save me a little trouble :)

Thanks,

TJ

Community
  • 1
  • 1
tjgreen
  • 469
  • 2
  • 10

4 Answers4

10

Great question. You should be using the better tool for the job.

If in fact your intentions are to use the better tool for the job (and you are sure lexx and yacc are going to be a pain) then I have something to share with you; it's not painful at all to call ocaml from c, and vice versa. Most of the time I've been writing ocaml calling C, but I have written a few the other way. They've mostly been debug functions that don't return a result. Although, the callings back and fourth is really about packing and unpacking the ocaml value type on the C side. That tutorial you mention covers all of that, and very well.

I'm opposed to Ron Savage remarks that you have to be an expert in the language. I recall starting out where I work, and within a few months, without knowing what a "functor" was, being able to call C, and writing a thousand lines of C for numerical recipes, and abstract data types, and there were some hiccups (not with unpacking types, but with garbage collection of an abstract data-types), but it wasn't bad at all. Most of the inner loops in the project are written in C --taking advantage of SSE, external libraries (lapack), tighter optimized loops, and some in-lined hand optimized assembly.

I think you might need to be experienced with designing a large project and demarcating functional and imperative sections. I would really assess how much ocaml you are going to be writing, and what kind of values you want to pass to C --I'm saying this because I'd be fearful of recommending to someone to pass a recursive data-structure from ocaml to C, actually, it would be lots of unpacking tuples, their contents, and thus a lot of possibility for confusion and bugs.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
nlucaroni
  • 47,556
  • 6
  • 64
  • 86
  • 2
    Very good answer. I should strongly point out, that this strongly depend of amount of OCaml work which should be done in comparison with C integration pain (API size). It apply to integrate any new language to any existing project. – Hynek -Pichi- Vychodil Aug 11 '10 at 05:06
  • Not only is this a good answer, it's even the one I wanted to hear :) – tjgreen Aug 11 '10 at 15:42
2

I one wrote a reasonably complex OCaml-C hybrid program. I was frustrated by what I found to be inadequate documentation, and I ended up spending too much time dealing with garbage collection issues. However, the resulting program worked and was fast.

I think there is a place for OCaml-C integration, but make sure it is worth the hassle. It might be simpler to have the programs communicate over a socket (assuming such IO operations won't eliminate the performance you want). It might also be more sane to just write the whole thing in C.

Diogenes Creosote
  • 1,922
  • 2
  • 17
  • 22
2

Interoperability is the achilles heel of standalone implementations of statically typed languages, particularly those without JIT compilation like OCaml. My own experience having been using OCaml for over 5 years is that the only reliable bindings are across simple APIs that do little more than pass large arrays, e.g. LAPACK. Even slightly more complicated bindings like those to FFTW took years to stabilize and others, like OpenGL and GLU, remain an unsolved problem. In particular, I found major bugs in binding code written by two of the authors of the OCaml compiler. If they cannot get it right then there is little hope for the rest of us...

However, all is not lost. The solution is simply to use looser bindings. Rather than handling interoperability at the C level with a low-level type-unsafe interface, use a high-level interface like XML-RPC with string passing or even over sockets. This is much easier to get right and, as you say, will let you leverage the enormous practical benefits offered by OCaml for this application.

J D
  • 48,105
  • 13
  • 171
  • 274
  • 2
    Using IPC instead of writing bindings is an incredibly often overlooked idea. Why do they always look for complicated solutions? – Michaël Le Barbier Mar 15 '16 at 09:24
  • 1
    Also often overlooked: Using plain command-line invocation + pipes instead of IPC. There is no need to manage yet another long-runnning process if invocation of a C or C++ program has "no" overhead in any practical sense. Of course, this is only feasible if you run it for coarse-grained tasks, i.e. called once per seconds or minute, as opposed to tight loops that call it once per millisecond. But you shouldn't do FFI within tight loops either, so this is not really a constraint, but merely enforcing best practive: Always push the tight loops to the lower level. – vog Apr 25 '18 at 10:50
1

My rule of thumb is to stick with the language / model / style used in the existing code-base, so that future maintenance developers inherit a consistent and understandable set of application code.

The only way I could justify something like what you are suggesting would be if:

  1. You are an Expert at OCaml AND a Novice at C (so you'll be 20x as productive)
  2. You have successfully integrated it with a C library before (apparently not)

If you are at all more familiar with C than OCaml, you've just lost any "theoretical" gain from OCaml being easier to use when writing a compiler - plus it seems at though you will have more peers familiar with C around you than OCaml.

That's my "grumpy old coder" 2 cents (which used to only cost a penny!).

Ron Savage
  • 10,923
  • 4
  • 26
  • 35
  • 7
    I don't understand why you make the assumption that one must be an expert at OCaml and a novice at C in order to be more productive using OCaml in an area that is precisely its strong suit. If there were a Web component, would you write all of that in C rather than, say, Python+HTML+Javascript? – Chuck Aug 11 '10 at 01:36
  • 4
    Question: Is Ron Savage's answer actually based on any knowledge/specifics regarding OCaml or even C? – Domingo Ignacio Aug 11 '10 at 03:34
  • 1
    @Chuck - Because getting the feature done in 1 month vs 5 months might make it worth the additional complexity, maintenance headaches and risk of introducing a completely new language into an existing project and team of C developers. 1 month vs 2 months? Not worth it. – Ron Savage Aug 11 '10 at 05:05
  • 1
    @Domingo - No knowledge of OCaml, knowledge of C and many other languages, plus way too much experience inheriting ramshackle conglomerate code bases with no architectural or design discipline. – Ron Savage Aug 11 '10 at 05:07
  • 2
    @Ron: I think the general rule of thumb you state is right on for production code, but my project has to do with writing research prototype code, where the maintenance and personnel issues are somewhat less relevant. (My question wasn't super clear about this.) In the best scenario, the prototype goes great, and someone would have to rewrite the code in C to put it in the real codebase. But even if we'd written it in C to begin with, it would probably be better to rewrite it from scratch at that point anyway. ("Plan to throw one away" ...) – tjgreen Aug 11 '10 at 15:40
  • @Tjgreen - Yep, that's true - my comments were more geared towards a production app. team. :-) – Ron Savage Aug 11 '10 at 16:21
  • 1
    @tjgreen; Good logic there. I will often prototype in ocaml, make sure I get the correct results, then work on a C implementation if I plan on doing it. In this way, I have some a method to verify results when bugs do come up in the C version. – nlucaroni Aug 11 '10 at 17:16
  • 1
    @Ron Understood-- a very good point on the topic of project management; just seemed pretty off-topic to me when the question was about OCaml/C specifics. – Domingo Ignacio Aug 11 '10 at 19:05
  • 1
    @tjgreen: I would be careful about saying "It's prototype code, not production code." Prototype code can turn into production code with very little warning. – TwentyMiles Aug 20 '10 at 20:51
  • @tjgreen Well, if that means that there will be OCaml code instead of C code in production, this might not be a bad ending at all ... ;-) – vog Apr 25 '18 at 10:56