Monday, August 18, 2008

Almost a full-blown LISP interpreter :)

[9th August 2008]

So I've been playing with the Eden library again, and I finished putting in some very cool functionality while watching China put on a phenomenal show for the Olympic Games Opening Ceremony...

Eden now supports cons! So we can build and return lists now, such as the following examples:

(cons 'maple' (list 'birch' 'oak' 'larch')) => ('maple' 'birch' 'oak' 'larch')
(list 'maple' (list 'birch' 'oak' 'larch')) => ('maple' ('birch' 'oak' 'larch'))

Date is now a first class atom-type, so we can do things like:

(defun next_sunday (x? f?)
    (prog
        (setq d (if (nil-p x) today x))
        (date (+ d (- 7 (day_of_week d))) f)))

To make the programmer's life easier, we also supports optional arguments to defun! To designate the formal parameter as optional, append it with a question mark. All formal parameters after the first optional one are implicitly optional. If no argument is bound to an optional parameter, it is initialised with nil.

Of course, + now handles dates and time-spans :)

[Ten days later]

OK. cons works, as do car, cdr and all the test functions like listp (which has an alias, a-la Scheme, of list?), atom? and nil?.

I've spent the last few days re-implementing the evaluation engine to dynamically support new native functions using reflection and cleaning up various odd-bits (yes, I was watching the Olympic Games and it did reduce my productivity a bit).

(defun make-list (x)
    (cond
        ((list? x) x)
        (t (cons x nil))))

(defun append (l x)
    (cond
        ((atom? l) (cons l (make-list x)))
        ((nil? l) (make-list x))
        (t (cons (car l) (append (cdr l) x)))
    ))

(defun reverse (l)
    (cond
        ((nil? l) l)
        ((atom? l) nil)
        ((list? (car l)) (append (reverse (cdr l)) (list (reverse (car l)))))
        (t
            (append (reverse (cdr l)) (car l))
        )
    ))

allows us to call

(reverse (list 'a' 'b' (list 'c' 'd') 'e' 'f')), which returns with (f e (d c) b a) as we expect!

In addition to optional parameters, easy-to-integrate native functions, properly stacked evaluation of (mutually) recursive functions, scoped parameters and variables, and multiple-statement defun definitions, we have all the makings of a full-blown, yet neat and compact LISP interpreter!

Next, I think I'll get it to support lambdas and mapcar!

While I'm at it, let me stress on the importance of writing unit tests, specially while developing foundational pieces like this one.

I've been using the native testing framework in Visual Studio 2008, and it's a breeze to be able to run off a whole suite of tests to make sure that a fix I put in for one bug didn't break something else somewhere.

Furthermore, as the tests get more and more complex and involved, it's comforting to know that the system is just growing stronger every time a feature gets added without anything being broken! I now have over 200 test expressions which are run after every major recompile, assuring that all the functionality is working as it should be...

Stay tuned!

Thursday, August 7, 2008

Post 6: Expression Evaluator - Finale

We have, in the previous posts, seen how to build an expression evaluator and integrate it into our calling environment.

The concepts we have discussed provide a solid basis for an evaluation engine that can be extended further.

For example, we could support custom functions with a defun function:

(defun <function name> <list of formal parameters> <function body>)

It's easy to see how we could implement the defun function as a known function, which would store the body and formal parameter list in a hash table indexed by the function name. We could then modify the Apply function to bind values to the formal parameters and evaluate the body.

We could support variable binding with the setq function:

(setq <var_name> <expr>)

We could do this by storing the value of the expression in a global symbol table indexed by the var_name, and modifying the Apply function to do look for the value of a variable first in the scoped symbol table and then the global one.

We could support conditional operations with the cond function, which returns the result associated with the first test expression that evaluates to 'true':

(cond ((test_1) (result_1))  ... ((test_n) (result_n)))

We could even expand our processing to perform evaluation of the function name:

((cond ((> w 20) "square") (t "cube")) p)

so that we get the value of (square p) if w > 20 and (cube p) otherwise!

With a few more modifications, we can support recursive functions such as the classic expression for a factorial

(defun decr (x) (- x 1))
(defun fact (x) (cond ((> x 0) (* x (fact (decr x)))) (t 1)))

Executing the above strings in order would allow us to compute factorials of any integer from that point on!

double fact10 = 10.ToString().Evaluate(null);

Cool?

Contact me if you would like to get your hands on this library!

One last thing - in honour of my newest niece who was born just as the first full implementation of this project was completed, I'm calling this library Eden. We'll be likely releasing BrightSword.Eden as a free download from the company home page soon.

Saturday, August 2, 2008

Post 5: Expression Evaluator - Integration

So far, we've used "constant" s-expressions as examples for evaluation, and that's pretty cool, but it is, of course, a great deal more interesting to be able to compute expression values when the expression contains variables bound to values at run-time.

For example, the formula for computing the Body Mass Index of a healthy human is given by the following formula:

BMI = (Weight in Kilograms / (Height in Meters) x (Height in Meters))

or as an s-expression

(div weight (prod height height))

If we were able to pass in the values for weight and height for various people, we could get the expression evaluator to compute the BMI.

To accomplish this, we make a few modifications to the Evaluate function, and pass in an addition symbol table with values for the variables.

public static Expression Evaluate(this Expression _this, Dictionary<string, object> Symbols)
{
    switch (_this.Children.Count)
    {
        case 0:
            /* atomic value */
            {
                switch (_this.Token)
                {
                    ...

                    case E_TOKEN.ValidName:
                        {
                            switch (_this.TokenValue.ToLower())
                            {
                                ... other "known" tokens

                                default:
                                    {
                                        try
                                        {
                                            _this.SetValue(Symbols[_this.TokenValue]);
                                        }
                                        catch (KeyNotFoundException)
                                        {
                                            _this.RegisterMissingSymbol(_this.TokenValue);
                                        }
                                    }
                                    break;
                            }
                        }
                        break;

                    case E_TOKEN.Text:
                    case E_TOKEN.ValidNumber:
                    case E_TOKEN.SingleQuotedText:
                    case E_TOKEN.DoubleQuotedText:
                        {
                            _this.SetValue(_this.TokenValue);
                        }
                        break;

                    default:
                        {
                            throw new EvaluatorException(_this, "Evaluate Error: Unknown Token!");
                        }
                }

                return _this;
            }

        ...
    }
}

To call our evaluator from some other code, we could do the following:

    string strExpression = ... get the string from a resource or database

    Dictionary<string, object> Symbols = new Dictionary<string, object>();
    Symbols["height"] = ... input from a form or database;
    Symbols["weight"] = ... input from a form or database;

    double BMI = strExpression.Evaluate(Symbols).ValueAsNumber;

Now if the customer wanted the formula changed on the fly to the non metric version, all we need to do is change the expression to be evaluated to

BMI = ( Weight in Pounds / ( Height in inches ) x ( Height in inches ) ) x 703

or

(prod (div weight (prod height height) 703)

and the compiled version of the code doesn't change at all!

Friday, August 1, 2008

Post 4: Expression Evaluator - Object Typing

We have seen how the evaluation engine recursively evaluates an expression tree and associates a value with each Expression object. We have also seen how the values associated with the Expression object are examined only by the individual function processors. Let's look at the function processor for the "+" function again

    case "+":  
        { 
            try 
            { 
                return Functor.SetValue(Arguments.Aggregate(0.0d, (result, x) => result += x.Evaluate().ValueAsNumber); 
            } 
            catch 
            { }

            return Functor.SetNil(); 
        }

We notice the generic Aggregate function on Arguments returns a double, and consequently the lambda expression returns a double, using the native += operator to operate on double values. This means that we have to somehow express the value of the Expression x as a double. Further, we need to ensure that the value associated with the Functor is henceforth interpretable as a double.

We also need to handle the case where it cannot be expressed as a number, and we'll deal with that in a moment.

We can also imagine other scalar types - like strings and boolean values - of immense importance when we want to do comparison expressions like:

(> a b)

We therefore create a Variant class (again a throwback to my C/C++ days) where a class can encapsulate a given value and represent it, if possible, in various types. In our case, we are going to support strings, numbers (doubles) and booleans, so our variant class looks something like:

public class VariantValue
{
    public E_VALUETYPE ValueType { get; protected internal set; }

       public object UntypedValue { get; protected internal set; }

    public virtual bool IsNumber
    {
        get
        {
            try
            {
                CastToNumber(this.UntypedValue);
                return true;
            }
            catch
            {
                return false;
            }
        }
    }

    protected static double CastToNumber(object value)
    {
        try
        {
            return System.Convert.ToDouble(value);
        }
        catch
        {
            throw new InvalidCastException("Cannot express (" + value.ToString() + ") as a Number");
        }
    }

    public virtual double ValueAsNumber
    {
        get
        {
            try
            {
                return CastToNumber(this.UntypedValue);
            }
            catch
            {
                return double.NaN;
            }
        }

        set
        {
            this.UntypedValue = value;
            this.ValueType = E_VALUETYPE.Number;
        }
    }

    #region Boolean ...

    #region String ...

    protected internal virtual VariantValue SetValue(bool value) { this.ValueAsBoolean = value; return this; }
    protected internal virtual VariantValue SetValue(double value) { this.ValueAsNumber = value; return this; }
    protected internal virtual VariantValue SetValue(string value) { this.ValueAsString = value; return this; }

    protected internal virtual VariantValue SetNil() ...

    protected internal virtual VariantValue SetValue(VariantValue value) ...

    protected internal virtual VariantValue SetValue(object value) ...
}

We can see how the VariantValue class stores the raw value in the UntypedValue property and tries to represent it as a Number, Boolean or String when required. This is important because a value can have more than one representation.

The Is... accessor properties allow callers to query if the value can be represented in the appropriate type.

The ValueAs... accessor properties return the value as the appropriate type if possible. They will never throw any exceptions. They are helped by the internal CastTo... functions, which do throw exceptions. When we examine the SetValue(object) function, we'll see how the exceptions play a crucial role in helping us decide how best to represent a generic object.

The ValueAs... mutator properties save the raw value in the UntypedValue property. Since the type of the value is known to them, they also store the type information in the ValueType property.

The code below is the SetValue(object) function. It is called when the type-specific version of SetValue cannot be called (for example when the value is obtained from a symbol table - look at the section on integration).

We successively attempt to coerce the value into one of the three known types, using the exceptions thrown from the CastTo... functions to signal that coercion was unsuccessful.

    protected internal virtual VariantValue SetValue(object value)
    {
        if (value == null) { return SetNil(); }

        // [true] can be coerced to a number if we don't do this
        if ((value is long) || (value is double))
        {
            this.ValueAsNumber = CastToNumber(value); return this;
        }

        if (value is bool)
        {
            this.ValueAsBoolean = CastToBoolean(value); return this;
        }

        try
        {
            this.ValueAsNumber = CastToNumber(value);
            return this;
        }
        catch
        { }

        try
        {
            this.ValueAsBoolean = CastToBoolean(value);
            return this;
        }
        catch
        { }

        try
        {
            this.ValueAsString = CastToString(value);
            return this;
        }
        catch
        { }

        this.UntypedValue = value;
        this.ValueType = E_VALUETYPE.Uninitialized;

        return this;
    }

Since every Expression object is associated with a value, we can make life easy and simply derive the Expression class from the VariantValue class. We are effectively saying that an Expression is a VariantValue.

public class Expression : VariantValue 
   ...

We're close to putting the finishing touches on our expression evaluator. The last post in this series will deal with integration with the calling environment, allowing us to use the evaluator within the context of any other application.