Monday, August 18, 2008

Almost a full-blown LISP interpreter :)

[9th August 2008]

So I've been playing with the Eden library again, and I finished putting in some very cool functionality while watching China put on a phenomenal show for the Olympic Games Opening Ceremony...

Eden now supports cons! So we can build and return lists now, such as the following examples:

(cons 'maple' (list 'birch' 'oak' 'larch')) => ('maple' 'birch' 'oak' 'larch')
(list 'maple' (list 'birch' 'oak' 'larch')) => ('maple' ('birch' 'oak' 'larch'))

Date is now a first class atom-type, so we can do things like:

(defun next_sunday (x? f?)
    (prog
        (setq d (if (nil-p x) today x))
        (date (+ d (- 7 (day_of_week d))) f)))

To make the programmer's life easier, we also supports optional arguments to defun! To designate the formal parameter as optional, append it with a question mark. All formal parameters after the first optional one are implicitly optional. If no argument is bound to an optional parameter, it is initialised with nil.

Of course, + now handles dates and time-spans :)

[Ten days later]

OK. cons works, as do car, cdr and all the test functions like listp (which has an alias, a-la Scheme, of list?), atom? and nil?.

I've spent the last few days re-implementing the evaluation engine to dynamically support new native functions using reflection and cleaning up various odd-bits (yes, I was watching the Olympic Games and it did reduce my productivity a bit).

(defun make-list (x)
    (cond
        ((list? x) x)
        (t (cons x nil))))

(defun append (l x)
    (cond
        ((atom? l) (cons l (make-list x)))
        ((nil? l) (make-list x))
        (t (cons (car l) (append (cdr l) x)))
    ))

(defun reverse (l)
    (cond
        ((nil? l) l)
        ((atom? l) nil)
        ((list? (car l)) (append (reverse (cdr l)) (list (reverse (car l)))))
        (t
            (append (reverse (cdr l)) (car l))
        )
    ))

allows us to call

(reverse (list 'a' 'b' (list 'c' 'd') 'e' 'f')), which returns with (f e (d c) b a) as we expect!

In addition to optional parameters, easy-to-integrate native functions, properly stacked evaluation of (mutually) recursive functions, scoped parameters and variables, and multiple-statement defun definitions, we have all the makings of a full-blown, yet neat and compact LISP interpreter!

Next, I think I'll get it to support lambdas and mapcar!

While I'm at it, let me stress on the importance of writing unit tests, specially while developing foundational pieces like this one.

I've been using the native testing framework in Visual Studio 2008, and it's a breeze to be able to run off a whole suite of tests to make sure that a fix I put in for one bug didn't break something else somewhere.

Furthermore, as the tests get more and more complex and involved, it's comforting to know that the system is just growing stronger every time a feature gets added without anything being broken! I now have over 200 test expressions which are run after every major recompile, assuring that all the functionality is working as it should be...

Stay tuned!

Thursday, August 7, 2008

Post 6: Expression Evaluator - Finale

We have, in the previous posts, seen how to build an expression evaluator and integrate it into our calling environment.

The concepts we have discussed provide a solid basis for an evaluation engine that can be extended further.

For example, we could support custom functions with a defun function:

(defun <function name> <list of formal parameters> <function body>)

It's easy to see how we could implement the defun function as a known function, which would store the body and formal parameter list in a hash table indexed by the function name. We could then modify the Apply function to bind values to the formal parameters and evaluate the body.

We could support variable binding with the setq function:

(setq <var_name> <expr>)

We could do this by storing the value of the expression in a global symbol table indexed by the var_name, and modifying the Apply function to do look for the value of a variable first in the scoped symbol table and then the global one.

We could support conditional operations with the cond function, which returns the result associated with the first test expression that evaluates to 'true':

(cond ((test_1) (result_1))  ... ((test_n) (result_n)))

We could even expand our processing to perform evaluation of the function name:

((cond ((> w 20) "square") (t "cube")) p)

so that we get the value of (square p) if w > 20 and (cube p) otherwise!

With a few more modifications, we can support recursive functions such as the classic expression for a factorial

(defun decr (x) (- x 1))
(defun fact (x) (cond ((> x 0) (* x (fact (decr x)))) (t 1)))

Executing the above strings in order would allow us to compute factorials of any integer from that point on!

double fact10 = 10.ToString().Evaluate(null);

Cool?

Contact me if you would like to get your hands on this library!

One last thing - in honour of my newest niece who was born just as the first full implementation of this project was completed, I'm calling this library Eden. We'll be likely releasing BrightSword.Eden as a free download from the company home page soon.

Saturday, August 2, 2008

Post 5: Expression Evaluator - Integration

So far, we've used "constant" s-expressions as examples for evaluation, and that's pretty cool, but it is, of course, a great deal more interesting to be able to compute expression values when the expression contains variables bound to values at run-time.

For example, the formula for computing the Body Mass Index of a healthy human is given by the following formula:

BMI = (Weight in Kilograms / (Height in Meters) x (Height in Meters))

or as an s-expression

(div weight (prod height height))

If we were able to pass in the values for weight and height for various people, we could get the expression evaluator to compute the BMI.

To accomplish this, we make a few modifications to the Evaluate function, and pass in an addition symbol table with values for the variables.

public static Expression Evaluate(this Expression _this, Dictionary<string, object> Symbols)
{
    switch (_this.Children.Count)
    {
        case 0:
            /* atomic value */
            {
                switch (_this.Token)
                {
                    ...

                    case E_TOKEN.ValidName:
                        {
                            switch (_this.TokenValue.ToLower())
                            {
                                ... other "known" tokens

                                default:
                                    {
                                        try
                                        {
                                            _this.SetValue(Symbols[_this.TokenValue]);
                                        }
                                        catch (KeyNotFoundException)
                                        {
                                            _this.RegisterMissingSymbol(_this.TokenValue);
                                        }
                                    }
                                    break;
                            }
                        }
                        break;

                    case E_TOKEN.Text:
                    case E_TOKEN.ValidNumber:
                    case E_TOKEN.SingleQuotedText:
                    case E_TOKEN.DoubleQuotedText:
                        {
                            _this.SetValue(_this.TokenValue);
                        }
                        break;

                    default:
                        {
                            throw new EvaluatorException(_this, "Evaluate Error: Unknown Token!");
                        }
                }

                return _this;
            }

        ...
    }
}

To call our evaluator from some other code, we could do the following:

    string strExpression = ... get the string from a resource or database

    Dictionary<string, object> Symbols = new Dictionary<string, object>();
    Symbols["height"] = ... input from a form or database;
    Symbols["weight"] = ... input from a form or database;

    double BMI = strExpression.Evaluate(Symbols).ValueAsNumber;

Now if the customer wanted the formula changed on the fly to the non metric version, all we need to do is change the expression to be evaluated to

BMI = ( Weight in Pounds / ( Height in inches ) x ( Height in inches ) ) x 703

or

(prod (div weight (prod height height) 703)

and the compiled version of the code doesn't change at all!

Friday, August 1, 2008

Post 4: Expression Evaluator - Object Typing

We have seen how the evaluation engine recursively evaluates an expression tree and associates a value with each Expression object. We have also seen how the values associated with the Expression object are examined only by the individual function processors. Let's look at the function processor for the "+" function again

    case "+":  
        { 
            try 
            { 
                return Functor.SetValue(Arguments.Aggregate(0.0d, (result, x) => result += x.Evaluate().ValueAsNumber); 
            } 
            catch 
            { }

            return Functor.SetNil(); 
        }

We notice the generic Aggregate function on Arguments returns a double, and consequently the lambda expression returns a double, using the native += operator to operate on double values. This means that we have to somehow express the value of the Expression x as a double. Further, we need to ensure that the value associated with the Functor is henceforth interpretable as a double.

We also need to handle the case where it cannot be expressed as a number, and we'll deal with that in a moment.

We can also imagine other scalar types - like strings and boolean values - of immense importance when we want to do comparison expressions like:

(> a b)

We therefore create a Variant class (again a throwback to my C/C++ days) where a class can encapsulate a given value and represent it, if possible, in various types. In our case, we are going to support strings, numbers (doubles) and booleans, so our variant class looks something like:

public class VariantValue
{
    public E_VALUETYPE ValueType { get; protected internal set; }

       public object UntypedValue { get; protected internal set; }

    public virtual bool IsNumber
    {
        get
        {
            try
            {
                CastToNumber(this.UntypedValue);
                return true;
            }
            catch
            {
                return false;
            }
        }
    }

    protected static double CastToNumber(object value)
    {
        try
        {
            return System.Convert.ToDouble(value);
        }
        catch
        {
            throw new InvalidCastException("Cannot express (" + value.ToString() + ") as a Number");
        }
    }

    public virtual double ValueAsNumber
    {
        get
        {
            try
            {
                return CastToNumber(this.UntypedValue);
            }
            catch
            {
                return double.NaN;
            }
        }

        set
        {
            this.UntypedValue = value;
            this.ValueType = E_VALUETYPE.Number;
        }
    }

    #region Boolean ...

    #region String ...

    protected internal virtual VariantValue SetValue(bool value) { this.ValueAsBoolean = value; return this; }
    protected internal virtual VariantValue SetValue(double value) { this.ValueAsNumber = value; return this; }
    protected internal virtual VariantValue SetValue(string value) { this.ValueAsString = value; return this; }

    protected internal virtual VariantValue SetNil() ...

    protected internal virtual VariantValue SetValue(VariantValue value) ...

    protected internal virtual VariantValue SetValue(object value) ...
}

We can see how the VariantValue class stores the raw value in the UntypedValue property and tries to represent it as a Number, Boolean or String when required. This is important because a value can have more than one representation.

The Is... accessor properties allow callers to query if the value can be represented in the appropriate type.

The ValueAs... accessor properties return the value as the appropriate type if possible. They will never throw any exceptions. They are helped by the internal CastTo... functions, which do throw exceptions. When we examine the SetValue(object) function, we'll see how the exceptions play a crucial role in helping us decide how best to represent a generic object.

The ValueAs... mutator properties save the raw value in the UntypedValue property. Since the type of the value is known to them, they also store the type information in the ValueType property.

The code below is the SetValue(object) function. It is called when the type-specific version of SetValue cannot be called (for example when the value is obtained from a symbol table - look at the section on integration).

We successively attempt to coerce the value into one of the three known types, using the exceptions thrown from the CastTo... functions to signal that coercion was unsuccessful.

    protected internal virtual VariantValue SetValue(object value)
    {
        if (value == null) { return SetNil(); }

        // [true] can be coerced to a number if we don't do this
        if ((value is long) || (value is double))
        {
            this.ValueAsNumber = CastToNumber(value); return this;
        }

        if (value is bool)
        {
            this.ValueAsBoolean = CastToBoolean(value); return this;
        }

        try
        {
            this.ValueAsNumber = CastToNumber(value);
            return this;
        }
        catch
        { }

        try
        {
            this.ValueAsBoolean = CastToBoolean(value);
            return this;
        }
        catch
        { }

        try
        {
            this.ValueAsString = CastToString(value);
            return this;
        }
        catch
        { }

        this.UntypedValue = value;
        this.ValueType = E_VALUETYPE.Uninitialized;

        return this;
    }

Since every Expression object is associated with a value, we can make life easy and simply derive the Expression class from the VariantValue class. We are effectively saying that an Expression is a VariantValue.

public class Expression : VariantValue 
   ...

We're close to putting the finishing touches on our expression evaluator. The last post in this series will deal with integration with the calling environment, allowing us to use the evaluator within the context of any other application.

Thursday, July 31, 2008

Post 3: Expression Evaluator - The Expression Class and Evaluation

The Expression Class is a n-ary tree representation of a given S-Expression.

In its simplest form, the class would look something like:

public class Expression
{
    protected Expression m_Parent = null;
    public virtual Expression Parent { get { return m_Parent; } }

    protected List<Expression> m_Children = new List<Expression>();
    public virtual List<Expression> Children { get { return m_Children; } }

    protected string m_TokenValue = String.Empty;
    public string TokenValue { get { return m_TokenValue; } }

    ... other housekeeping properties ...

At this point in the evaluation cycle, we have simply created a different representation of the S-Expression.

We actually perform the evaluation of each node by recursively evaluating each of the children and merging their results. The obvious questions here are:

  • How exactly do we merge the results?
  • How do we evaluate the children?

It would also be important to discuss the issue of the type of the result.

Let's go back to a simple S-Expression

(+ 1 2 3 4)

The Expression tree generated would be:

  • Root (1 Child)
    • >> (5 Children)
      • "+" (0 Children)
      • "1" (0 Children)
      • "2" (0 Children)
      • "3" (0 Children)
      • "4" (0 Children)

In order to apply meaning to the expression, we define the convention that the first child represents some operation which has to be performed on all its siblings in order, which when applied, results in the value associated with the first child. Further, we define that the value of a node is the value of the first child.

In our example case, we define that the value of the Root node is the value of the anonymous ">>" node, which in turn is the value of the "+" node, which in turn is the value resulting from applying the "+" function on "1", "2", "3" and "4".

We notice that we have now answered the first question "How do we merge the results".

To answer the second question, let's take a closer look at the Evaluate function of the Expression class, shown here in its simplified form:

public Expression Evaluate()
{
    switch (this.Children.Count)
    {
        case 0:
            /* atomic value */
            {
                switch (this.Token)
                {
                    case E_TOKEN.SExprStart:
                        break;

                    case E_TOKEN.ValidName:
                    case E_TOKEN.Text:
                    case E_TOKEN.ValidNumber:
                    case E_TOKEN.SingleQuotedText:
                    case E_TOKEN.DoubleQuotedText:
                        {
                            this.SetValue(this.TokenValue);
                        }
                        break;

                    default:
                        {
                            throw new EvaluatorException(this, "Evaluate Error: Unknown Token!");
                        }
                }

                return _this;
            }

        case 1:
            /* s-expr or root node */
            {
                return this.SetValue(this.Children[0].Evaluate());
            }

        default:
            /* function call */
            {
                // apply first child to the remaining children
                return this.SetValue(Apply(this.Children[0], this.Children.Skip(1).ToList()));
            }
    }
}

We can see the straightforward processing of the Root and Atom cases. The default case deals with the situation where the first child needs to be "applied" on the other children. Apply is the most elaborate function in this project, but it's quite simple really. If all we did was addition, this is what Apply() would look like:

private static Expression Apply(Expression Functor, List<Expression> Arguments)
{
    string strFunc = Functor.TokenValue;

    switch (strFunc.ToLower())
    {
        #region SUM/CONCAT
        case "+":
        case "add":
        case "sum":
            {
                try
                {
                    return Functor.SetValue(Arguments.Aggregate(0.0d, (result, x) => result += x.Evaluate().ValueAsNumber);
                }
                catch
                { }

                return Functor.SetNil();
            }
        #endregion

    }
}

The lambda expressions and List extension functions in C# make writing the processor sections a breeze. It's easy to see how the first child's value (the Functor argument to Apply)  is set to the sum of the other children (the Arguments argument to Apply).

We are only left with one major issue to discuss - that of the value type.

Wednesday, July 30, 2008

Post 2: Expression Evaluator - Building an Expression Tree

The Parse function described in the previous posts allows us to specify a ParseEventHandler delegated function which will be called-back by the parser.

We use this function to monitor the parsing of an expression string, and build a corresponding expression tree, which can then be evaluated. This is one of several approaches we could take for the evaluation process - alternatively, we could proceed with the evaluation in-situ, with a makeshift stack. I took this approach because it allows us to re-use the expression strings without having the re-parse - which is something we'll need to support user-defined functions.

The building of the expression tree is straightforward. We require a simple class to keep track of the Root node of the expression, and the Current node - which represents the S-Expression being parsed currently.

We write a simple ParseEventHandler delegate, which hooks the SExprStart state to signify the creation of a sub-tree, and the Atom state to signify leaf nodes of the expression tree. We pass along a ParserContext object to keep a reference to the Root and the Current nodes of the expression tree.

public class ParserContext
{
    public Expression Root { get; private set; }
    public Expression Current { get; set; }

    public ParserContext(Expression Root)
    {
        this.Root = Root;
        this.Current = Root;
    }
}

The ParseEventHandler function is simple and elegant. All we're doing here is listening for three events:

  • SExprStart : Signifying a new S-Expression child. Our handler creates a child and sets it as the current parent, so future nodes will get bound to it.
  • SExprFinish: Signifying the completion of the current S-Expression. Our handler "pops the stack", as it were, and marks the parent node as the current parent.
  • Atom: Signifying a leaf-node child of the current parent. Simply creates and attaches a leaf-node, leaving the current parent intact.

protected static void ParseEventHandler(ParseEventArgs e)
{
    ParserContext pctx = e.ParserContext as ParserContext;
    if (pctx == null) return;

    switch (e.Token)
    {
        case E_TOKEN.SExprStart:
            {
                pctx.Current = new Expression(pctx.Current, ">>", e.State, e.Token);
            }
            return;

        case E_TOKEN.SExprFinish:
            {
                pctx.Current = pctx.Current.Parent;
            }
            return;
    }

    switch (e.State)
    {
        case E_PARSESTATE.Atom:
            {
                Expression exIgnore = pctx.Current.NewChild(e.TokenValue, e.State, e.Token);
            }
            return;

        case E_PARSESTATE.Failure:
            {
                throw new ParseException(String.Format("*** FAILURE {0}] : [ {1} ], {2}, {3}", e.ErrorDescription, e.TokenValue, e.State, e.Token));
            }
    }
}

To actually call the parser, all we do is pass in a reference to the handler and context instance:

Expression Root = new Expression();

Parser.Parse(pszExpression, new ParseEventHandler(ParseEventHandler), new ParserContext(Root));

// ... at this point, Root has been built up, or parsing has failed and thrown an exception

We'll discuss the Expression class in detail in the next post.

Eden is Born!

Eden Trinity Azariah was born today in Moncton, Canada to my brother and sister-in-law!

IMG_4346

Mom and baby are doing well!

IMG_4358

Big sister Tabitha couldn't be happier!

Welcome, baby Eden!