Thinking about Code Generation

Published 06 June 07 10:01 AM | jons 
I'm starting a new project for a client

it is to be a CSLA 2.1 application portfolio consisting of several win form applications and several Web applications all of which are tied to a central SQL Server database.

Since CSLA places most of the "heavy lifting" in a set of common base classes, the remaining logic in the business classes which derive from these base classes tends to be quite routine.  Not that I'm not complaining about this: routine is good.  One of the benefits of "routine" is that it is possible to generate a lot of the code.

I use code generation on a lot of projects in the past.  I have some fairly specific "technical requirements" that I think that code generation has to to provide:

  • Code generation is controlled through templates that define what code is generated. That is, I want to see and control the contents of the templates that are used. No “black box” code generation for me, thanks. Note that this project will need around a dozen templates, one each for each of the CSLA defined classes (root, child, read-only, etc.) and another for each list of classes.
  • Code generation is done during the build process. There are several implications that spring from this point:
    • Code generation from a “virtual table” (see above) is “just not done”.
    • The extension or modification of the generated code should be “separate” from the corresponding generated code; otherwise, the re-generation of the code will clobber the extensions.
  • Code generation is done from an intermediate format, not directly from the database schema. My preference is to extract the relevant parts of the schema into an XML file, annotate or edit that XML file, and generate from the altered XML file.
  • The intermediate format is under source code control with versioning.
  • The code generation templates are also under source code control.

 

One of the problems with code generation is that it is a "one size fits all" solution. 

While it is technically possible to generate every line of code in an application, it is rarely possible to justify such an effort economically. The typical approach is to separate the code for each generated class into two parts: the generated code and the hand-crafted approach. There are two objectives here:

  • One objective is to be able to re-generate the classes (to reflect changes in the templates and the underlying database) without affecting the hand-crafted code.
  • A second objective is to be able to easily extend or modify the generated code when that code does not reflect the full needs of the application.

There are four general approaches that can be used to accomplish these objectives:

  • The code generator recognizes special regions in the generated class code. These special regions hold code that is to be preserved when (and if) the class is regenerated. Each time that the code is regenerated, the code generator reads the existing source code, extracts the contents of the preserved regions, regenerates the code, and re-inserts the preserved code.
  • The code generator templates generate the classes using the “partial” keyword. The developer can create a separate file containing source code that is also marked with the class name and the “partial” keyword. At compile time, the compiler merges the contents of the “partial” classes to produce a single integrated executable class.
  • The code generator templates generate “base” classes that must be inherited. The developer can/must generate a separate file containing source code that inherits from the “base” class.
  • The code generator templates generate full classes that accept some form of “plug-in” code for the extensible. While this is very powerful, it does require a lot of design and re-design to get a plug-in structure that allows the right amount of effort to override the functionality of the class. I am not going to take this approach any further than mentioning that it is a possibility.

The advantage of the “special regions” approach is that if there are no extensions, there is no need to write any additional code. The same is true of the “partial” approach. Both of these approaches permit the class to be extended with a minimum of fuss, although the “special regions” approach would seem to be more risky (but that risk is greatly mitigated with the careful use of a source code control system). The disadvantage of both of these methods is that it is difficult to replace generated code with hand-crafted code. The more of the overall source code of each class that is generated, the more significant this limitation becomes. For example, if the database queries for the CRUD operations are generated in the class, there is no way to override this in the hand-crafted part of the class.

The advantage of the base/child class approach is that it is relatively easy to arrange things so that it is possible to override virtually anything in the base class. The primary disadvantage of this approach is that it is necessary to create a child class for each base/child pair even when the child has no need of overriding code. Note that in the general case, it is possible to use a factory pattern that returns a child class only when the child exists; however, the use of the factory pattern in CSLA would require a double factory pattern and that just does not “smell” right.

My preference is to use the base/child pattern. The initial versions of the child pattern can be generated using a simple template.

Well I am off to see the client.  I'll dump out some more of my thoughts later as the project evolves.

tags:

New Comments to this post are disabled

About jons

Jon Stonecash is a technology consultant and has been designing, developing, and testing various kinds of software for such a long time that he has had the opportunity to make most of the serious software development mistakes at least once. His long term interests center about databases and the aspects of the application that handle data access and business logic. He is also interested in the tools that assist the development process, particularly code generation.