Lecture 7

Functional programming (Sethi Ch 9)

Review of previous lecture

Datatypes

Writing an interpreter

Admin

Discussion of homework sheet 6

Any questions?

Functional programming

Review

Functional languages like SML support:

So far, we've used the built-in datatypes (e.g. lists).

We will now see how to define our own datatypes.

Case study: writing an interpreter.

Datatypes

Introduction

So far, the only high-level datatype we have seen is lists.

In ML, you can also define your own types using a datatype declaration.

ML (Meta Language) was designed for manipulating languages such as programming languages, so it's very good at Abstract Syntax Trees.

Enumerated types

In SML you can declare enumerated types:

- datatype color = RED | GREEN | BLUE;
> datatype color
  con RED = RED : color
  con GREEN = GREEN : color
  con BLUE = BLUE : color

You can now use these constants in programs:

- rev [RED, GREEN, BLUE];
> val it = [BLUE, GREEN, RED] : color list

What about a person type, either student or lecturer?

Enumerated types cont

To write programs using enumerated types you use pattern-matching:

- fun colorToString (c : color) : string = (
    case c of RED => (
      "red"
    ) | GREEN => (
      "green"
    ) | BLUE => (
      "blue"
    )
  );
> val colorToString = fn : color -> string
- colorToString (RED);
> val it = "red" : string
- colorToString (GREEN);
> val it = "green" : string

What about a isStudent : person -> bool function?

Structured types

What if we want more colors than just red, green and blue?

- datatype color = 
    RGB of (int * int * int)
  | CMYK of (int * int * int * int);
> datatype color
  con RGB = fn : int * int * int -> color
  con CMYK = fn : int * int * int * int -> color

This says that a color is either an RGB value (with three numbers) or a CMYK value (with four).

- RGB (1,5,9);
> val it = RGB(1, 5, 9) : color
- CMYK (54,23,7,99);
> val it = CMYK(54, 23, 7, 99) : color

What about a person type, either faculty or student, faculty have courses to teach, students have SSNs?

Structured types

We can use pattern-matching to get data out of a structured type.

- fun yellowInk (c : color) : int = (
    case c of (CMYK (c,m,y,k)) => (
      y
    ) | (RGB (r,g,b)) => (
      255 - b
    )
  );
- yellowInk (RGB (1,5,9));
> val it = 246 : int
- yellowInk (CMYK (54,23,7,99));
> val it = 7 : int

What about a function teaches : person -> course list (where students don't teach courses!)

Comparison with grammars

ML datatypes are very close to parse trees for grammars.

For example:

  <integer> ::= 
    { <digit> }
  | "-" { <digit> }

Becomes:

  datatype integer = 
      Positive of digit list
    | Negative of digit list;                  

We shall see later how to write parsers.

Comparison with imperative languages

In ML:

- datatype color = 
    RGB of (int * int * int)
  | CMYK of (int * int * int * int);

In an imperative language:

  enum ColorTag = (RGB, CMYK);
  struct Color {
    ColorTag tag;
    ColorUnion contents;
  };
  union ColorUnion {
    ColorRGB rgb;
    ColorCMYK cmyk;
  };
  struct ColorRGB {
    int r; int g; int b;
  };
  struct ColorCMYK {
    int c; int y; int m; int k;
  };

Comparison with imperative languages

In ML:

  RGB (1,2,3);

In an imperative language:

  new Color {
    tag = RGB;
    contents = new ColorUnion {
      rgb = new ColorRGB {
        r = 1; g = 2; b = 3;
      }
    }
  };

This is one place where SML wins!

Recursive types

Where SML really wins is handling tree structures, such as abstract syntax trees.

We have already seen a datatype for lists.

We can define our own tree types.

For example:

- datatype BTree =
    Leaf |
    Node of BTree * int * BTree;
> datatype BTree
  con Leaf = Leaf : BTree
  con Node = fn : BTree * int * BTree -> BTree

For example:

- Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Leaf));
> val it = Node(Node(Leaf, 1, Leaf), 2, Node(Leaf, 3, Leaf)) : BTree

Functions on binary trees

The size of a binary tree:

- fun size (t : BTree) : int = (
    case t of Leaf => (
      0
    ) | (Node (left, root, right)) => (
      size (left) + 1 + size (right)
    )
  );
> val size = fn : BTree -> int

For example:

- size (it);
> val it = 3 : int;

Functions on binary trees

Insert a number into a sorted binary tree:

- fun insert (t : BTree, x : int) : BTree = (
    case t of Leaf => (
      Node (Leaf, x, Leaf)
    ) | (Node (left, root, right)) => (
      if (x < root) then (
        Node (insert (left, x), root, right)
      ) else (
        Node (left, root, insert (right, x))
      )
    )
  );
> val insert = fn : BTree * int -> BTree

For example:

- insert (Leaf, 2);
> val it = Node(Leaf, 2, Leaf) : BTree
- insert (it, 1);
> val it = Node(Node(Leaf, 1, Leaf), 2, Leaf) : BTree
- insert (it, 3);
> val it = Node(Node(Leaf, 1, Leaf), 2, Node(Leaf, 3, Leaf)) : BTree

What about summing all the elements in a binry tree?

What about flattening a binary tree down to a list?

Comparison with imperative languages

In an imperative language:

  struct BTree {
    left : *BTree;
    root : int;
    right : *BTree;
  }

and use null pointer to represent Leaf nodes.

Result: null pointer exceptions!

This technique doesn't scale up to abstract syntax trees.

Abstract syntax trees

We could give a grammar for binary trees:

  <tree> ::= 
    "leaf"
  | "node (" <tree> "," <int> "," <tree> ")"

Parse trees for this grammar can be represented by the ML datatype:

- datatype BTree =
    Leaf |
    Node of BTree * int * BTree;

Abstract syntax trees cont

We can use ML datatypes for most any syntax tree.

For example, take the grammar:

  <exp> ::= if <exp> then <exp> else <exp>
         | <var> "(" <exp> ")"
         | <var>
         | <int>
         | <exp> <binop> <exp>
  <binop> ::= "+" | "-" | "*" | "/" | "="

This allows expressions such as:

  if (x = 0) then (
    1
  ) else (
    fact (x - 1)
  )

Note that this grammar is ambiguous!

Abstract syntax trees cont

Grammar:

  <exp> ::= if <exp> then <exp> else <exp>
         | <var> "(" <exp> ")"
         | <var>
         | <int>
         | <exp> <binop> <exp>
  <binop> ::= "+" | "-" | "*" | "/" | "="

Corresponding SML datatypes:

datatype binop =
  PLUS | MINUS | TIMES | DIVIDE | EQUALS;  

datatype exp =
  If of exp * exp * exp |
  Apply of string * exp |
  Var of string |
  Int of int |
  Binop of exp * binop * exp;

Abstract syntax trees cont

Grammar:

  <dec> ::= <empty>
         |  fun <var> "(" <var> ")" <exp> ; <dec>
  <prog> ::= <dec> <exp>

For example, a valid program is:

  fun fact (x) (
    if (x = 0) then (
      1
    ) else (
      fact (x - 1)
    )
  );
  fact (5)  

What is the corresponding SML datatype?

We shall now look at a case study of using these datatypes to write a small interpreter.

Case study: an interpreter

Introduction

To run the interpreter:

For example:

  use "interp.sml";
  val program : string = 
    "fun fact (x) ( if (x = 0) then 1 else x * fact (x-1) ); fact (5)";
  go (program);

This will produce a lot of debugging information, and finally the result: 120

Overview

There are four phases to the program:

We can then plug these together to interpret a program.

Lexical analysis

We need a function to take a string and return the list of tokens in that string.

- lex ("x * (fact (x - 1))");
> val it =
    [VAR "x", BINOP TIMES, LPAREN, VAR "fact", LPAREN, VAR "x", 
     BINOP MINUS, INT 1, RPAREN, RPAREN]
    : token list

A token is given by:

datatype token = 
  FUN | SEMI | IF | THEN | ELSE | LPAREN | RPAREN |
  BINOP of binop | VAR of string | INT of int;

The code for lexing is in interp.sml.

Parsing

The first problem is that the grammar is ambiguous!

  <exp> ::= if <exp> then <exp> else <exp>
          | <var> "(" <exp> ")"
          | <var>
          | <int>
          | <exp> <binop> <exp>

We resolve the ambiguity:

  <exp> ::= <atom> <rest>

  <rest> ::= <empty>
          |  <binop> <exp>

  <atom> ::= if <exp> then <exp> else <exp>
          | "(" <exp> ")"
          | <var> "(" <exp> ")"
          | <var>
          | <int>

Parsing

We then write parsing functions for each of the non-terminals.

Each of the parsers looks like:

  parseExp (toks : token list) : (exp * token list) = ( ... )

that is it takes in a token list, and returns the parsed expression, plus the unused tokens.

For example:

  parseExp ([INT 1, BINOP PLUS, INT 2, RPAREN, RPAREN]);

produces:

  (
    Binop (Int 1, PLUS, Int 2),
    [RPAREN, RPAREN]
  )

The code for parsing is in interp.sml.

Single-stepping

Once we have the AST representation of an expression and a declaration, we can single-step through the expression.

This uses a function singleStep : (d : dec, e : exp) : exp which interprets one step of the expression e.

For example:

  singleStep (Empty, Binop (Int 1, PLUS, Int 2));

produces Int 3.

Single-stepping

The difficult bit is function application.

When we come to a node Apply (x, e) we need to look up the definition of x and apply it to e.

For example, this is the first step in:

fact(5)

if 5 = 0 then (
  1
) else (
  5 * (fact(5 - 1))
)

So we want:

  singleStep (d, Apply (x, e1)) =
    apply (d, x, e1)

so what does apply do?

Single-stepping

apply (d, x, e1) searches d until it finds the appropriate function declaration, then calls replace:

fun apply (d : dec, x : string, e : exp) : exp = (
  case d of Empty => (
    raise RuntimeError
  ) | (Fun (y, z, e1, d1)) => (
    if (x = y) then (
      replace (z, e, e1)
    ) else (
      apply (d, x, e)
    )
  )
);

For example, if d contains the declaration:

  fun fact (x) (if x = 0 then 1 else x * (fact (x - 1)));

then calling apply (d, "fact", Int 5) produces:

  if 5 = 0 then 1 else 5 * (fact (5 - 1))

So what does replace do?

Single-stepping

We are implementing macro-expansion so replace (z, e, e1) just syntactically replaces z by e in e1.

For example:

- replace ("x", Int(5), Binop (Var "x", MINUS, Int 1));
> val it = Binop(Int 5, MINUS, Int 1) : exp

says that replacing x with 5 in x - 1 produces 5 - 1.

The code for single-stepping is in interp.sml.

Pretty-printing

Finally we need to pretty-print an expression.

We do this with a function expToString : exp -> string.

The only tricky part is getting indentation and parentheses right!

The code for pretty-printing is in interp.sml.

Putting it all together

fun multiStep (d : dec, e : exp) : exp = (  
  print (expToString (e));
  case e of (Int _) => (
    e
  ) | _ => (
    multiStep (d, singleStep (d, e))
  )
);

fun run (p : prog) : prog = (
  case p of (Program (d, e)) => (
    Program (d, multiStep (d, e))
  )
);

fun go (text : string) : prog = (
  run (parse (lex (text)))
);

And that's it!

Homework

Add < and > to the language.

Make the language call-by-value (by editing 2 lines of code and adding 2 more!)

Summary

SML is designed for manipulating abstract syntax trees.

It has good datatype support for ASTs, which correspond naturally to grammars.

Writing lexers, parsers, and AST manipulation functions is easy in SML (it took me about 6 hours to write this program).

Next week

Java!