Project 2: Type-Checking SimpleC Programs
Table of Contents
Overview
In this, we will perform type checking on the abstract syntax tree (AST) produced by the parser. Part of the type checker is provided to you. typecheck.c
is the file to implement the rest of the type checker in. The type checker will maintain a symbol table for each static scope and annotate the tree with type information for each expression while traversing the tree. When finished the program will print the AST with these additional type annotations. If the input program violates SimpleC's type rules, it will exit with an error code.
Setup
- Get the new project source files
Add the two new C files to your Makefile's
SRC
variableSRC := \ lexer.c \ parser.tab.c \ typecheck.c \ table.c \ ast.c \ ast_printer.c \ util.c \ main.c
Modify
main.c
to includetypecheck.h
call the type checker on the output of the parser before printing the AST.#include "typecheck.h"
In
main()
:yyout = stderr; yyparse(); check_prog(program_ast); print_prog(program_ast, 0);
(optional) Use the precompiled parser if you haven't finished project 1.
- Modify your Makefile to use the precompiled parser instead:
- Remove
parser.tab.c
fromSRC
Add
parser.tab.o
to the rule forsimplec
simplec: $(OBJ) $(CC) $(CFLAGS) -o $@ $^ parser.tab.o
Recall that you need to use the
tab
character for indenting rules in Makefiles
- Remove
- Download parser.tab.o.project1
Copy it to
parser.tab.o
and update its timestamp, but keep the parser.tab.o.project1 file around just in case:cp parser.tab.o.project1 parser.tab.o
WARNING:
parser.tab.o
has been compiled for the Vagrant VM, so it will likely not work on your host OS.- Modify your Makefile to use the precompiled parser instead:
Submitting your project
To facilitate easier grading and regrading, please use git tags to identify which version of the project you'd like us to use for grading.
Error codes
- exit(1) means there is a failed invariant check, e.g., an unexpected NULL value, which is already handled for you in the template.
- exit(3) means typechecker error, which is produced by calling
type_error(char *msg)
, given in the template fortypecheck.c
.
Type specification
Static scoping
- SimpleC is a statically scoped language, that is, user-defined symbols are only accessible within a syntactically-defined region of the source file.
- Symbols declared at the top of the program are in the the global scope and accessible by the entire program.
- Function definitions may only appear in the global scope
- Each function definition's body creates a nested, static scope, identified syntactically by curly braces. Its parameters and any declarations are only accessible within the function body's scope.
- Compound statements also create their owned nested, static scope between their curly braces.
- Declarations bind symbol names to a type in whatever scope they appear. Declarations of function types are only permitted in the global scope. Any nested scopes must only declare non-function types.
- Function definitions (which are different from declarations) bind the function name to the function type in the global scope before creating a nested scope in which its parameters and locally-declared symbols are bound.
- Within the same scope, only one binding per unique symbol name is permitted.
Declarations
- Bind the declared symbol to its type
- It's a type error if the symbol has already been bound in the current scope
Function definitions
- Check that the function is declared with a function type
- Check for duplicate definitions in the current scope
- Add the function to the current (global) symbol table
- Create a new scope
- Add parameters to local scope
Check for an incorrect number of parameters (paramlist) compared to the type's list of parameters (paramtypes)
// type error because paramlist doesn't match paramtypes f(a, b) : function (int) -> char { ... }
- Add the local variable declarations to the local symtab
- Check the function body for type errors
- Check the return expression for type errors
- Check that the return expression's type match the function type
- Restore the parent scope
Statements
- Assignment statement
- Get the type of the left-hand side and of the right-hand side
- Check that the left-hand side is an L-value, i.e., its type represents a memory address that can take a value
- In SimpleC, three kinds of expression can be used as L-values
- An identexpr
- A unaryexpr only where the operator is a pointer dereference
- An arrayexpr
- In SimpleC, three kinds of expression can be used as L-values
- Check that the type of the left- and right-hand side expressions match.
- If, if-then-else, and while statements
- The conditional expression should be an int
- Recursively type check the if branch and the else branch, if present or the while body
- Compound statements
- These create a new, nested static scope. So create a new scope before type checking the body. Don't forget to restore the parent scope after.
Expression
- identexpr
- Lookup the symbol's binding and store the type in the
expr->type
field on the expression. - Return a type error if the symbol has not been declared in any of the nested scopes
- Lookup the symbol's binding and store the type in the
- callexpr
- Get the type from the symtab or call to undeclared function type error
- Check that the symbol is a function type
- Check that the number of types match or type error
- Compare the types of each argument's expression (in the list) match each corresponding parameter type from the function type or type error
- Set this expression to the function's return type
- int, char, and str expressions
- Set the expression to the corresponding type INTTYPE, CHARTYPE, or STRINGTYPE.
- arrayexpr
- Check that the expr being dereferenced is either a pointer or an array type
- A dereference unboxes the pointer type
- An array access unboxes the array type
- It's a type error if the expression is neither an array or pointer
- unaryexpr
- Recursively check the operand.
- Check the type of the operator:
& : (type) -> pointer<type>
- gets the pointer to a variable of a given type
* : (pointer<type>) -> type
- dereferences the pointer to a variable of a given type
- : (type) -> type
- arithmetic negation
! : (type) -> int
- logical negation
- Don't forget to the set the unaryexpr's type at the end
- binaryexpr
- Recursively check the operands.
- Check the type of the operator.
- arithmetic: allows any operands of the same type and returns the same type
* : (type, type) -> type
/ : (type, type) -> type
% : (type, type) -> type
+ : (type, type) -> type
- : (type, type) -> type
- comparison operators: takes either char or int, but not both, and returns an int to represent a boolean value
< : (type, type) -> int
> : (type, type) -> int
<= : (type, type) -> int
>= : (type, type) -> int
== : (type, type) -> int
!= : (type, type) -> int
- logical operators: takes int and returns an int to represent a boolean value
&& : (int, int) -> int
|| : (int, int) -> int
- arithmetic: allows any operands of the same type and returns the same type
- Don't forget to set the resulting type of the expression.
- castexpr
- Permit casting between any types (except functions, which expression can't be any way)
API
storing expression types in the
->type
fieldtype information about expressions is communicated to the rest of the type checker by assigning the
->type
field on expressions. see the template and in-class guides to see examples of this.current_scope
holds the current scopethe scope holds the symbol table
current_scope->table
. the current scope needs to be updated upon entering a new static scope by creating a new scope data structure. once the scope is over, thecurrent_scope
needs to be restored to the parent scope.static bool compare_types(T_type type1, T_type type2)
this compares two types and returns true if they match.
static void type_error(char *msg);
this should be called with a message when there is a type error. it will exit with the appropriate error code necessary for automatic grading.
static T_scope create_scope(T_scope parent);
this creates a new scope that points to the given parent scope.
static void destroy_scope(T_scope scope);
this frees a scope once it is no longer needed
void insert(T_table table, string key, void *binding);
bind a symbol name (
key
) to a type (binding
) in a given symbol table. access the current scope's symbol table withcurrent_scope->table
.void *lookup(T_table table, string key);
retrieve the binding for a symbol name (
key
) in a given symbol table. if the symbol is not bound, then this function returnsNULL
.static T_type lookup_in_all_scopes(T_scope scope, string ident);
search for a binding of a symbol name (
ident
) starting from the current scope and recursively checking each parent scope until either a symbol is found until reaching the global scope. if no binding is found, this function returnsNULL
.
Testing
- As you program, be sure to create small (unit) test cases for any changes you make.
- As an illustration of the input and output of the type checker, consider these two files:
- The test cases repo has example programs to try.
Tips
Need to keep open three files to reference simultaneously while writing the type checker
- The AST reference, in order to get the relevant language construct elements
- The Type AST reference, in order to pull out the relevant type information
The type rules described in this document
Keep this three references available to you while you implement the type checker