Functional programming (Sethi Ch 9)
Review of previous lecture
Datatypes
Writing an interpreter
Discussion of homework sheet 6
Any questions?
Functional languages like SML support:
So far, we've used the built-in datatypes (e.g. lists).
We will now see how to define our own datatypes.
Case study: writing an interpreter.
So far, the only high-level datatype we have seen is lists.
In ML, you can also define your own types using a datatype
declaration.
ML (Meta Language) was designed for manipulating languages such as programming languages, so it's very good at Abstract Syntax Trees.
In SML you can declare enumerated types:
- datatype color = RED | GREEN | BLUE; > datatype color con RED = RED : color con GREEN = GREEN : color con BLUE = BLUE : color
You can now use these constants in programs:
- rev [RED, GREEN, BLUE]; > val it = [BLUE, GREEN, RED] : color list
What about a person type, either student or lecturer?
To write programs using enumerated types you use pattern-matching:
- fun colorToString (c : color) : string = (
case c of RED => (
"red"
) | GREEN => (
"green"
) | BLUE => (
"blue"
)
);
> val colorToString = fn : color -> string
- colorToString (RED);
> val it = "red" : string
- colorToString (GREEN);
> val it = "green" : string
What about a isStudent : person -> bool function?
What if we want more colors than just red, green and blue?
- datatype color =
RGB of (int * int * int)
| CMYK of (int * int * int * int);
> datatype color
con RGB = fn : int * int * int -> color
con CMYK = fn : int * int * int * int -> color
This says that a color is either an RGB value (with three numbers) or a CMYK value (with four).
- RGB (1,5,9); > val it = RGB(1, 5, 9) : color - CMYK (54,23,7,99); > val it = CMYK(54, 23, 7, 99) : color
What about a person type, either faculty or student, faculty have courses to teach, students have SSNs?
We can use pattern-matching to get data out of a structured type.
- fun yellowInk (c : color) : int = (
case c of (CMYK (c,m,y,k)) => (
y
) | (RGB (r,g,b)) => (
255 - b
)
);
- yellowInk (RGB (1,5,9));
> val it = 246 : int
- yellowInk (CMYK (54,23,7,99));
> val it = 7 : int
What about a function teaches : person -> course list
(where students don't teach courses!)
ML datatypes are very close to parse trees for grammars.
For example:
<integer> ::=
{ <digit> }
| "-" { <digit> }
Becomes:
datatype integer =
Positive of digit list
| Negative of digit list;
We shall see later how to write parsers.
In ML:
- datatype color =
RGB of (int * int * int)
| CMYK of (int * int * int * int);
In an imperative language:
enum ColorTag = (RGB, CMYK);
struct Color {
ColorTag tag;
ColorUnion contents;
};
union ColorUnion {
ColorRGB rgb;
ColorCMYK cmyk;
};
struct ColorRGB {
int r; int g; int b;
};
struct ColorCMYK {
int c; int y; int m; int k;
};
In ML:
RGB (1,2,3);
In an imperative language:
new Color {
tag = RGB;
contents = new ColorUnion {
rgb = new ColorRGB {
r = 1; g = 2; b = 3;
}
}
};
This is one place where SML wins!
Where SML really wins is handling tree structures, such as abstract syntax trees.
We have already seen a datatype for lists.
We can define our own tree types.
For example:
- datatype BTree =
Leaf |
Node of BTree * int * BTree;
> datatype BTree
con Leaf = Leaf : BTree
con Node = fn : BTree * int * BTree -> BTree
For example:
- Node (Node (Leaf, 1, Leaf), 2, Node (Leaf, 3, Leaf)); > val it = Node(Node(Leaf, 1, Leaf), 2, Node(Leaf, 3, Leaf)) : BTree
The size of a binary tree:
- fun size (t : BTree) : int = (
case t of Leaf => (
0
) | (Node (left, root, right)) => (
size (left) + 1 + size (right)
)
);
> val size = fn : BTree -> int
For example:
- size (it); > val it = 3 : int;
Insert a number into a sorted binary tree:
- fun insert (t : BTree, x : int) : BTree = (
case t of Leaf => (
Node (Leaf, x, Leaf)
) | (Node (left, root, right)) => (
if (x < root) then (
Node (insert (left, x), root, right)
) else (
Node (left, root, insert (right, x))
)
)
);
> val insert = fn : BTree * int -> BTree
For example:
- insert (Leaf, 2); > val it = Node(Leaf, 2, Leaf) : BTree - insert (it, 1); > val it = Node(Node(Leaf, 1, Leaf), 2, Leaf) : BTree - insert (it, 3); > val it = Node(Node(Leaf, 1, Leaf), 2, Node(Leaf, 3, Leaf)) : BTree
What about summing all the elements in a binry tree?
What about flattening a binary tree down to a list?
In an imperative language:
struct BTree {
left : *BTree;
root : int;
right : *BTree;
}
and use null pointer to represent Leaf
nodes.
Result: null pointer exceptions!
This technique doesn't scale up to abstract syntax trees.
We could give a grammar for binary trees:
<tree> ::=
"leaf"
| "node (" <tree> "," <int> "," <tree> ")"
Parse trees for this grammar can be represented by the ML datatype:
- datatype BTree =
Leaf |
Node of BTree * int * BTree;
We can use ML datatypes for most any syntax tree.
For example, take the grammar:
<exp> ::= if <exp> then <exp> else <exp>
| <var> "(" <exp> ")"
| <var>
| <int>
| <exp> <binop> <exp>
<binop> ::= "+" | "-" | "*" | "/" | "="
This allows expressions such as:
if (x = 0) then (
1
) else (
fact (x - 1)
)
Note that this grammar is ambiguous!
Grammar:
<exp> ::= if <exp> then <exp> else <exp>
| <var> "(" <exp> ")"
| <var>
| <int>
| <exp> <binop> <exp>
<binop> ::= "+" | "-" | "*" | "/" | "="
Corresponding SML datatypes:
datatype binop = PLUS | MINUS | TIMES | DIVIDE | EQUALS; datatype exp = If of exp * exp * exp | Apply of string * exp | Var of string | Int of int | Binop of exp * binop * exp;
Grammar:
<dec> ::= <empty>
| fun <var> "(" <var> ")" <exp> ; <dec>
<prog> ::= <dec> <exp>
For example, a valid program is:
fun fact (x) (
if (x = 0) then (
1
) else (
fact (x - 1)
)
);
fact (5)
What is the corresponding SML datatype?
We shall now look at a case study of using these datatypes to write a small interpreter.
To run the interpreter:
use "interp.sml";go (program);.For example:
use "interp.sml";
val program : string =
"fun fact (x) ( if (x = 0) then 1 else x * fact (x-1) ); fact (5)";
go (program);
This will produce a lot of debugging information, and finally the
result: 120
There are four phases to the program:
We can then plug these together to interpret a program.
We need a function to take a string and return the list of tokens in that string.
- lex ("x * (fact (x - 1))");
> val it =
[VAR "x", BINOP TIMES, LPAREN, VAR "fact", LPAREN, VAR "x",
BINOP MINUS, INT 1, RPAREN, RPAREN]
: token list
A token is given by:
datatype token = FUN | SEMI | IF | THEN | ELSE | LPAREN | RPAREN | BINOP of binop | VAR of string | INT of int;
The code for lexing is in interp.sml.
The first problem is that the grammar is ambiguous!
<exp> ::= if <exp> then <exp> else <exp>
| <var> "(" <exp> ")"
| <var>
| <int>
| <exp> <binop> <exp>
We resolve the ambiguity:
<exp> ::= <atom> <rest>
<rest> ::= <empty>
| <binop> <exp>
<atom> ::= if <exp> then <exp> else <exp>
| "(" <exp> ")"
| <var> "(" <exp> ")"
| <var>
| <int>
We then write parsing functions for each of the non-terminals.
Each of the parsers looks like:
parseExp (toks : token list) : (exp * token list) = ( ... )
that is it takes in a token list, and returns the parsed expression, plus the unused tokens.
For example:
parseExp ([INT 1, BINOP PLUS, INT 2, RPAREN, RPAREN]);
produces:
(
Binop (Int 1, PLUS, Int 2),
[RPAREN, RPAREN]
)
The code for parsing is in interp.sml.
Once we have the AST representation of an expression and a declaration, we can single-step through the expression.
This uses a function singleStep : (d : dec, e : exp) : exp
which interprets one step of the expression e.
For example:
singleStep (Empty, Binop (Int 1, PLUS, Int 2));
produces Int 3.
The difficult bit is function application.
When we come to a node Apply (x, e)
we need to look up the definition of x
and apply it to e.
For example, this is the first step in:
fact(5) if 5 = 0 then ( 1 ) else ( 5 * (fact(5 - 1)) )
So we want:
singleStep (d, Apply (x, e1)) =
apply (d, x, e1)
so what does apply do?
apply (d, x, e1) searches d
until it finds the appropriate function declaration,
then calls replace:
fun apply (d : dec, x : string, e : exp) : exp = (
case d of Empty => (
raise RuntimeError
) | (Fun (y, z, e1, d1)) => (
if (x = y) then (
replace (z, e, e1)
) else (
apply (d, x, e)
)
)
);
For example, if d contains the declaration:
fun fact (x) (if x = 0 then 1 else x * (fact (x - 1)));
then calling apply (d, "fact", Int 5) produces:
if 5 = 0 then 1 else 5 * (fact (5 - 1))
So what does replace do?
We are implementing macro-expansion so
replace (z, e, e1) just syntactically replaces
z by e in e1.
For example:
- replace ("x", Int(5), Binop (Var "x", MINUS, Int 1));
> val it = Binop(Int 5, MINUS, Int 1) : exp
says that replacing x with 5 in
x - 1 produces 5 - 1.
The code for single-stepping is in interp.sml.
Finally we need to pretty-print an expression.
We do this with a function expToString : exp -> string.
The only tricky part is getting indentation and parentheses right!
The code for pretty-printing is in interp.sml.
fun multiStep (d : dec, e : exp) : exp = (
print (expToString (e));
case e of (Int _) => (
e
) | _ => (
multiStep (d, singleStep (d, e))
)
);
fun run (p : prog) : prog = (
case p of (Program (d, e)) => (
Program (d, multiStep (d, e))
)
);
fun go (text : string) : prog = (
run (parse (lex (text)))
);
And that's it!
Add < and > to the language.
Make the language call-by-value (by editing 2 lines of code and adding 2 more!)
SML is designed for manipulating abstract syntax trees.
It has good datatype support for ASTs, which correspond naturally to grammars.
Writing lexers, parsers, and AST manipulation functions is easy in SML (it took me about 6 hours to write this program).
Java!