Monday, July 21, 2008

Part 0: Building a Basic Expression Evaluator

Quite often in our work, we come across situations where we would have to customize some business-logic quickly and on-the-fly. One such situation is when we have a workflow coded up, but the logic for transitioning between states is not fully known or is likely to change when the customers finally decide what they want.

In some of such situations, we would really love it if we could store the state transition-logic in a form that is easily editable at customization-time and dynamically evaluable to allow the workflow's behaviour to evolve and refine over time.

Being a classic computer scientist at heart, I decided to see if we could embed a little LISP evaluation engine and perhaps write and store the customization logic as a LISP expression, which would be evaluated at run-time and allow for the flexibility we needed. Then, being a classic masochist, I decided to spend a few days and see if I could build the evaluation engine itself. (Actually, I did google up for an embeddable LISP engine, but found that the gymnastics involved in integrating the engine into our environment wouldn't be worth the trouble!)

The evaluator we built takes a string s-expression value, and reduces it to a single scalar value. We built it in stages, and I'll describe the building process similarly. Unlike a full blown LISP implementation, all our functions return a single scalar - so there isn't support for the cons primitive.

The lexical structure for the little language we're building is:

  • Program ::= Atom | SExpr
  • Atom ::= number | boolean | string 
  • SExpr ::= "(" Atom | SExpr ")"
  • number is a pattern conforming to the following IEEE regular expression for a floating point: ^[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]*\\.?[0-9]+)?$.
  • boolean is one of the following strings : t, true, false, nil and f. Each corresponding to its commonly known, self-explanatory value
  • string is a pattern conforming to an unquoted sequence of non-white-space characters, or a quoted sequence of characters including white-space

Semantically, we will use the general LISP convention where the car or first element of the SExpr is the function we want to apply to the cdr or the remainder of the SExpr. We will also pre-define some of the well-known functions we want to process.

So for example, we might want to evaluate the following strings and expect the associated results:

  • (+ 1 2 3 4) => 10
  • (+ 1 2 (* 3 2)) => 8
  • (+ "hello" " " "world") => "hello world"
  • (and (> 4 2 1) (<= 3 4 4 5)) => true

The cool thing to notice is that we want our engine to deal with dynamically typing the values into either numbers, boolean values or strings. The well-known functions also have to be intelligent and decide, for example, when to sum numbers and when to concatenate strings.

Of course, as the engine evolves, we want to pay special attention to integrating it into an environment from which it could be called, and introduce support for custom functions to supplement the well-known ones. Further improvements would include support for recursive and mutually recursive functions.

The posts following will outline the approach to building the evaluation engine and providing the support we desire.

No comments: