Thursday, February 1, 2007

The Code-Generation Mindset

I'm a lazy guy.

I hate doing the same kind of thing over and over again. That's what computers are good at. So the cool thing to do is to try and get the computer to do the grunge work of writing a program, leaving me to do interesting things like designing the application properly...

There is nothing new about Code-Generation. Compilers have been doing it for years, consuming simpler syntax and higher-level abstractions and churning out instructions native to the execution platform. What's more, programming languages and their compilers have become more intelligent and automatically perform significant optimizations so that an expert assembly programmer (a dying, if not extinct, breed) would find it hard to improve upon the code generated at the instruction-set level.

And this was old news in 1998...

However, almost ten years on, we're still writing code...

Granted, a lot of the code written in the industry now is in scripting languages, which can be argued to be at a higher level of abstraction than is system or even application programming. However, the same principle holds even now - as is evidenced by the kind of program required to connect to a database table and generate a table of its contents to be viewed on a web page.

"Hang on", you say, "that was a big deal in 2004 - before tools like Visual Studio.NET 2005 came along with their point-and-click wizards which generated all the code for that kind of thing"...

...and you'd be right - that's exactly what I'm saying too. In fact, by the time I'm done with this topic, we'll build the exact pattern (down to tabs, semicolons and spelling mistakes) that Visual Studio.NET 2005 uses in its TypedDataset generator tool. (If that, by itself, isn't remarkable, consider the fact that it took me less than 2 days to write my pattern!)

What would be nice is if I could apply the code-generation paradigm to any repetitive, semi-mindless pattern of code. The way we currently deal with this now is either:

  1. Cut/Paste/Edit the pattern to fit the requirement. Pray and compile/deploy hoping that the editing was comprehensively correct. Too many problems to list with this approach.
  2. Make the pattern into a soulless library which boils everything to the lowest common denominator and adds a pattern to calling it. Generally not a huge gain there, plus added complexity.
  3. Use a functional programming language with functions as arguments to the variable bits of the pattern. Maps, Filters and Folds work really well for some kinds of patterns that iterate over a set of items. More on this much later.
  4. Code-Generate. If the pattern is repeated a lot, this is the most sensible approach if you have the tools to do this. I'm going to try and expound on this in a little detail now.

By the way, let me plug my company here - I founded, and currently work in, BrightSword Technologies Private Limited, and I am based in Bangalore, India. We recognised the value of this approach a long time ago, and built a code-generation framework which we have successfully used in a variety of real-world applications. We also took the data-driven web-application as a pattern, and wrote and sold a standalone code-generation tool called BrightSword Designer way back in 2002. We even had the point-and-click wizards for ASP, JSP, PHP and ASP.NET long before Visual Studio.NET 2005, and the ASP pattern implemented inheritance and polymorphism using explicit delegation even though VBScript has neither concept fully implemented!

Anyway, back to the mindset...

To successfully apply the Code-Generation approach, two prerequisites must be fulfilled:

  1. There must be a recognizable pattern. A pattern of code is basically a chunk of code with a few parts that are invariant, and some variants which change between instances of the pattern.
  2. We must be able to describe the variant portion meaningfully with respect to the pattern.

Let me try to clarify this with an example. Consider this snippet of code you might write when calling a stored procedure using ADO.NET:

                do
{
try
{
cmd.Parameters.Add("@p_ID", System.Data.SqlDbType.UniqueIdentifier);

cmd.Parameters["@p_ID"].Value = this.ID;
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(String.Format("Customer.Persist - Persisting Customer.ID threw Exception [ {0} ]. Bailing.", ex.Message));
throw ex;
}
}
while(false);

do
{
try
{
cmd.Parameters.Add("@p_IsActive", System.Data.SqlDbType.Bit);

cmd.Parameters["@p_IsActive"].Value = this.IsActive;
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(String.Format("Customer.Persist - Persisting Customer.IsActive threw Exception [ {0} ]. Filling Null Value.", ex.Message));
cmd.Parameters["@p_IsActive"].Value = System.DBNull.Value;

break;
}
}
while(false);

do
{
try
{
cmd.Parameters.Add("@p_Name", System.Data.SqlDbType.NVarChar);

if (this.Name == null)
{
cmd.Parameters["@p_Name"].Value = System.DBNull.Value;
}
else
{
cmd.Parameters["@p_Name"].Value = this.Name;
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(String.Format("Customer.Persist - Persisting Customer.Name threw Exception [ {0} ]. Filling Null Value.", ex.Message));
cmd.Parameters["@p_Name"].Value = System.DBNull.Value;

break;
}
}
while(false);

// mumble the incantation to execute the cmd on the connection

As a digression, notice that this kind of code contains the kind of bullet-proofing and optimization that you would likely want to put into your code, but can't or won't because it's too tedious. It's far simpler to write the usual code given in all the demos, and leave things to chance and the gods:

                cmd.Parameters.Add("@p_ID", this.ID);
cmd.Parameters.Add("@p_IsActive", this.IsActive);
cmd.Parameters.Add("@p_Name", this.Name);

// mumble the incantation to execute the cmd on the connection

Anyway - let's identify the pattern in the code.

It's basically the piece of code between the "do" and the "while(false);". (In reality, the entire class is the pattern, but let's keep things simple for the example.)

The variant portion of the pattern takes a little recognition. We recognise the obvious:

  • the name of the parameter
  • the type of the parameter

Not so obvious are the following:

  • whether the parameter is mandatory or optional
  • whether the property of the class is a reference or value type

So in a formal way, we can say that the metadata (or "type") of the variant is:

property_metadata ::= {name : string, type : string, is_mandatory : bool, is_reference : bool}  

If we assume a kind of pseudocode notation, we could write

template :=
[PATTERN_BEGIN(property : property_metadata)]
do
{
try
{
cmd.Parameters.Add("@p_[property.name]", [property.type]);
[IF property.is_mandatory]
cmd.Parameters["@p_[property.name]"].Value = this.[property.name];
[ELSE]
[IF property.is_reference]
if (this.[property.name] == null)
{
cmd.Parameters["@p_[property.name]"].Value = System.DBNull.Value;
}
else
{
cmd.Parameters["@p_[property.name]"].Value = this.[property.name];
}
[ELSE]
cmd.Parameters["@p_[property.name]"].Value = this.[property.name];
[ENDIF]
}
[ENDIF]
catch (Exception ex)
{
[IF property.is_mandatory]
System.Diagnostics.Debug.WriteLine(String.Format("Customer.Persist - Persisting Customer.[property.name] threw Exception [ {0} ]. Bailing.", ex.Message));
throw ex;
[ELSE]
System.Diagnostics.Debug.WriteLine(String.Format("Customer.Persist - Persisting Customer.[property.name] threw Exception [ {0} ]. Filling Null Value.", ex.Message));
cmd.Parameters["@p_[property.name]"].Value = System.DBNull.Value;

break;
[ENDIF]
}
}
while(false);
[PATTERN_END]

Now, we can enumerate the parametrizing entities:

data := {
{"ID", "System.Data.SqlDbType.UniqueIdentifier", true, false},
{"IsActive", "System.Data.SqlDbType.Bit", false, false},
{"Name", "System.Data.SqlDbType.NVarChar", false, true}
}

It's pretty obvious that with this kind of a setup, we could make the assertion that we could generate the desired block of by somehow applying the data onto the template. We simply need some magic wand to perform the desired apply operation. In pseudo-functional notation,

code := apply(template, data)

My primary assertion in this article is that recognizing patterns and following the metadata-driven approach will lead to cleaner and better code. Thinking this way allows us to worry about bullet-proofing and optimizing our code, because the time invested in strengthening the template of the pattern is miniscule compared to the time required to scan and change all occurrences of the pattern. We could extend and strengthen the code by simply modifying the template, or more radically by extending the metadata and template together. One bug fixed or one optimization baked into the template gets multiplied into the generated code. Big Hammer!

Of course, there is the small matter of performing the apply operation to somehow take both template and data to give us code - but you'll have to stay tuned for that!

Food for thought, no?

Well! Bon Appetit!!

No comments: