What are advantages and disadvantages of code generation?

Question

There are probably different kinds of code generation. In RoR for example, Rails can create skeletons for models, controllers, etc. But the developer has to complete those skeletons.

Now some times there are projects where many core artifacts in their entirety get generated according to a set of definitions or models.

I am mainly interested to know the advantages and disadvantages of this latter type of code generation.

This looks like a duplicate of http://stackoverflow.com/questions/113286/do-you-create-your-own-code-generators — Lukas Eder, Jan 11 '11 at 11:13

score 7 · Accepted Answer · answered Dec 23 '10 at 13:41

7

The main advantage is that it does the work for you, its repeatable, and that the code will most likely work (that depends of course if the person who wrote the generator knew what they were doing). It can remove the a lot of necessary time doing menial coding tasks. For example, is it really worth your time to write objects which are nothing more than containers for data from the database, or is it better to have some program automatically create these for you?

The big disadvantage is that it forces you into writing the code that is compatible with the generated code. Most of the time this isn't a problem, but it can be a real hassle when someone comes up to you and says "Hey, can we do X?" and that conflicts with the generated code. If the generator is good, it will allow you to change functionality, but that almost always increases the complexity of the code generated etc. This complexity has a price. It's more difficult to understand, and it can be less efficient that code you write yourself. This of course varies by situation.

answered Dec 23 '10 at 13:41

kemiller2002

113,795
27
197
251

There is an easy way to tackle the growing complexity of a code generation - it's a staged code generation. If you transform your source DSL into a target code in a sequence of trivial steps, the whole thing remains simple and maintainable, whatever the complexity of the source DSL is. – SK-logic Dec 23 '10 at 14:26
In line with @SK-logic comment, AtomWeaver (www.atomweaver.com) is a free (freemium actually) code generator that generates code in four steps, and combines discrete mini-generators (think Lego) into one large, maintainable generator. – Rui Curado Dec 23 '10 at 14:32
@SK-logic: One problem that I personally have experienced with custom code generators built in-house is that they have an extra learning curve. Another one is that it increases the time to execute the so called `clean compile`. Plus, the generated code, to varying degrees, does not conform to the coding style of the team (e.g. indentings, etc.) I think the conclusion is that most of the time it's a matter of personal preference... :) – Behrang Feb 12 '11 at 13:49
I was more interested in hearing facts rather than personal observations. Say an academic research about the advantages and disadvantages of code generation. But anyway, I'll accept this answer is it is the most voted. – Behrang Feb 12 '11 at 13:51
@Behrang: why do you want to ever look into a generated code? Coding style does not matter at all. In practice, a DSL compiler which generates a code, plus all the code implemented in this DSL, is much more compact than an equivalent code in a general-purpose programming language. Same for a learning curve: understanding a DSL is a way much easier than grasping a pile of spagetti code. Try to apply this staging trick, and you'll see how much simpler it is. – SK-logic Feb 12 '11 at 14:02
@SK-logic: Last year I was working in a legacy project in which almost everything (mainly Java code) was generated using a DSL written in Ruby. It was a very painful experience. However, one thing that right now I can remember is that in the DSL, obviously, we didn't have static type checking. Another painful problem was that people used to add new stuff to be generated, and on every checkout or pull from the repo, we had to regenerate everything otherwise we would get a lot of cryptic errors. Overall it had made it a very painful and unproductive environment. – Behrang Oct 26 '11 at 03:45
@Behrang Saeedzadeh, you can write a Fortran-style code in Haskell, if you try hard. You can build an unmaintainable code generation system as well, if you're really, really dedicated to do so. From your description it is obvious that someone was dedicated enough: no proper build system in place to check the dependencies, checking intermediate code into version control, no proper type system in DSL, single stage code generation. There is no single thing done right in your list. – SK-logic Oct 26 '11 at 08:23
@SK-logic you're right that the code generation system in that project was not done right, but even in general I think code generation is helpful only when one needs to use badly designed libraries and frameworks. For example in EJB 2, due to the big number of artifacts per EJB, programmers used to use XDoclet to reduce the pain to some extent. But in EJB 3, XDoclet is no longer necessary as the framework has a better design that makes code generation unnecessary. Again, in that project, we used to generate DAOs. But had we used Spring Data JPA, it would become unnecessary to do so. – Behrang Oct 26 '11 at 15:06
@SK-logic So far I have only been happy with lexer and parser code generators. Maybe that's because they generate artifacts that are algorithmically complex. – Behrang Oct 26 '11 at 15:08
@Behrang Saeedzadeh, code generation is more useful when there is a poorly designed language (e.g., Java) used for a problem domain which is better described in a more high level DSL. It is not about complexity - it is more about readability and maintainability of a source code. The closer it is to a problem domain terminology, the easier it is to follow. I'm using DSLs for virtually all my development work - but, I rarely use a plain text code generation, this approach works much better with meta-languages. – SK-logic Oct 26 '11 at 15:54

Stephan Eggermont · Answer 2 · 2010-12-23T14:34:11.150

1

The main problem with this style of programming is that it contaminates a view of your project. It no longer allows you to practice DRY. It is useful to have a clean separation between that what is automatically generated, and that which is written by a human. Most systems, especially file-based ones, do not support such a separation well. In systems that have good introspection capabilities (e.g. smalltalk images), building a dynamic object structure by walking the definition/model is preferable.

In illusion-based programming (as practiced in large companies and government agencies) it is very useful because it allows the generation of very impressive stacks of documentation and show impressive implementation performance as measured in lines of code per man month. There your most important skill is of course timing your disappearance act.

edited Dec 23 '10 at 14:34

answered Dec 23 '10 at 14:25

Stephan Eggermont

15,847
1
38
65

I can't see how, for example, .inc files in the LLVM and Clang source tree contaminates the view of a project. And they're all generated from .td files. Visual Studio with T4 displays generated files as sub-nodes of their source templates in a project tree. And, you do not have to generate intermediate files at all, as with Common Lisp macro metaprogramming. – SK-logic Dec 23 '10 at 14:49
Depends on the quality of the separation. Sounds like LLVM does ok. I tend to find more naive implementations a lot more in the wild. – Stephan Eggermont Dec 23 '10 at 15:18
So, after all, it's not a problem of a code generation itself, but it's just a technique being misused widely. Well, this is a common problem for all the programming techniques that ever existed. – SK-logic Dec 23 '10 at 17:52
The OP asks for disadvantages. This is one. It is an inherent problem of source-to-source code generation. You can do a lot to make sure it is not a large problem, but in most contexts there are better solutions. Keeping high-level information available allows for better decisions to be made. – Stephan Eggermont Dec 25 '10 at 10:21

score 1 · Answer 3 · edited May 23 '17 at 12:09

I think the most important thing to keep in mind is WHY you want to generate source code. Is it, for instance, because you are more fluent with UML than any programming language and hence want to generate object-oriented classes from that graphical model?

Is it because you expressed a schema definition in any language (SQL DDL for example: jOOQ, XSD for example JAXB code generation) and want to generate a model from that?

The advantage of code generation is always the fact that you express something only once (as in DRY, like Stephan stated). This is a very good practice that made it deep into extreme programming (among other processes). When you keep things DRY, you will not run the risk that the model differs from its glue code. On the other hand, you might blow up your glue code because it will exactly match its underlying model. Typically, you have one class/type/object per RDMBS table or per XML element.

If, however, you use code generation because you're more at ease with a modelling language (as in MDA, or model-driven architecture), you might run the risk that your generated code is not good enough (lack of detail) or too complicated (lack of simplicity) because - for instance - UML is not suited for solving problems in detail.

In any case: code generation can be very helpful if the generated code can be used AS-IS and does not need any customisation. As soon as you start customising generated code, it may become a maintenance nightmare.

What are advantages and disadvantages of code generation?

3 Answers3