UP | HOME

Project 2: Type-Checking SimpleC Programs

Table of Contents

Overview

In this, we will perform type checking on the abstract syntax tree (AST) produced by the parser. Part of the type checker is provided to you. typecheck.c is the file to implement the rest of the type checker in. The type checker will maintain a symbol table for each static scope and annotate the tree with type information for each expression while traversing the tree. When finished the program will print the AST with these additional type annotations. If the input program violates SimpleC's type rules, it will exit with an error code.

Setup

  1. Get the new project source files
  2. Add the two new C files to your Makefile's SRC variable

    SRC := \
      lexer.c \
      parser.tab.c \
      typecheck.c \
      table.c \
      ast.c \
      ast_printer.c \
      util.c \
      main.c
    
  3. Modify main.c to include typecheck.h call the type checker on the output of the parser before printing the AST.

    #include "typecheck.h"
    

    In main():

    yyout = stderr;
    yyparse();
    check_prog(program_ast);
    print_prog(program_ast, 0);
    
  4. (optional) Use the precompiled parser if you haven't finished project 1.

    • Modify your Makefile to use the precompiled parser instead:
      1. Remove parser.tab.c from SRC
      2. Add parser.tab.o to the rule for simplec

        simplec: $(OBJ)
          $(CC) $(CFLAGS) -o $@ $^ parser.tab.o
        

        Recall that you need to use the tab character for indenting rules in Makefiles

    • Download parser.tab.o.project1
    • Copy it to parser.tab.o and update its timestamp, but keep the parser.tab.o.project1 file around just in case:

      cp parser.tab.o.project1 parser.tab.o
      

    WARNING: parser.tab.o has been compiled for the Vagrant VM, so it will likely not work on your host OS.

Submitting your project

To facilitate easier grading and regrading, please use git tags to identify which version of the project you'd like us to use for grading.

Error codes

  • exit(1) means there is a failed invariant check, e.g., an unexpected NULL value, which is already handled for you in the template.
  • exit(3) means typechecker error, which is produced by calling type_error(char *msg), given in the template for typecheck.c.

Type specification

Static scoping

  • SimpleC is a statically scoped language, that is, user-defined symbols are only accessible within a syntactically-defined region of the source file.
    • Symbols declared at the top of the program are in the the global scope and accessible by the entire program.
    • Function definitions may only appear in the global scope
    • Each function definition's body creates a nested, static scope, identified syntactically by curly braces. Its parameters and any declarations are only accessible within the function body's scope.
    • Compound statements also create their owned nested, static scope between their curly braces.
  • Declarations bind symbol names to a type in whatever scope they appear. Declarations of function types are only permitted in the global scope. Any nested scopes must only declare non-function types.
  • Function definitions (which are different from declarations) bind the function name to the function type in the global scope before creating a nested scope in which its parameters and locally-declared symbols are bound.
  • Within the same scope, only one binding per unique symbol name is permitted.

Declarations

  • Bind the declared symbol to its type
  • It's a type error if the symbol has already been bound in the current scope

Function definitions

  • Check that the function is declared with a function type
  • Check for duplicate definitions in the current scope
  • Add the function to the current (global) symbol table
  • Create a new scope
  • Add parameters to local scope
  • Check for an incorrect number of parameters (paramlist) compared to the type's list of parameters (paramtypes)

    // type error because paramlist doesn't match paramtypes
    f(a, b) : function (int) -> char { ... } 
    
  • Add the local variable declarations to the local symtab
  • Check the function body for type errors
  • Check the return expression for type errors
  • Check that the return expression's type match the function type
  • Restore the parent scope

Statements

  • Assignment statement
    • Get the type of the left-hand side and of the right-hand side
    • Check that the left-hand side is an L-value, i.e., its type represents a memory address that can take a value
      • In SimpleC, three kinds of expression can be used as L-values
        1. An identexpr
        2. A unaryexpr only where the operator is a pointer dereference
        3. An arrayexpr
    • Check that the type of the left- and right-hand side expressions match.
  • If, if-then-else, and while statements
    • The conditional expression should be an int
    • Recursively type check the if branch and the else branch, if present or the while body
  • Compound statements
    • These create a new, nested static scope. So create a new scope before type checking the body. Don't forget to restore the parent scope after.

Expression

  • identexpr
    • Lookup the symbol's binding and store the type in the expr->type field on the expression.
    • Return a type error if the symbol has not been declared in any of the nested scopes
  • callexpr
    • Get the type from the symtab or call to undeclared function type error
    • Check that the symbol is a function type
    • Check that the number of types match or type error
    • Compare the types of each argument's expression (in the list) match each corresponding parameter type from the function type or type error
    • Set this expression to the function's return type
  • int, char, and str expressions
    • Set the expression to the corresponding type INTTYPE, CHARTYPE, or STRINGTYPE.
  • arrayexpr
    • Check that the expr being dereferenced is either a pointer or an array type
    • A dereference unboxes the pointer type
    • An array access unboxes the array type
    • It's a type error if the expression is neither an array or pointer
  • unaryexpr
    • Recursively check the operand.
    • Check the type of the operator:
      • & : (type) -> pointer<type>
        • gets the pointer to a variable of a given type
      • * : (pointer<type>) -> type
        • dereferences the pointer to a variable of a given type
      • - : (type) -> type
        • arithmetic negation
      • ! : (type) -> int
        • logical negation
    • Don't forget to the set the unaryexpr's type at the end
  • binaryexpr
    • Recursively check the operands.
    • Check the type of the operator.
      • arithmetic: allows any operands of the same type and returns the same type
        • * : (type, type) -> type
        • / : (type, type) -> type
        • % : (type, type) -> type
        • + : (type, type) -> type
        • - : (type, type) -> type
      • comparison operators: takes either char or int, but not both, and returns an int to represent a boolean value
        • < : (type, type) -> int
        • > : (type, type) -> int
        • <= : (type, type) -> int
        • >= : (type, type) -> int
        • == : (type, type) -> int
        • != : (type, type) -> int
      • logical operators: takes int and returns an int to represent a boolean value
        • && : (int, int) -> int
        • || : (int, int) -> int
    • Don't forget to set the resulting type of the expression.
  • castexpr
    • Permit casting between any types (except functions, which expression can't be any way)

API

  • storing expression types in the ->type field

    type information about expressions is communicated to the rest of the type checker by assigning the ->type field on expressions. see the template and in-class guides to see examples of this.

  • current_scope holds the current scope

    the scope holds the symbol table current_scope->table. the current scope needs to be updated upon entering a new static scope by creating a new scope data structure. once the scope is over, the current_scope needs to be restored to the parent scope.

  • static bool compare_types(T_type type1, T_type type2)

    this compares two types and returns true if they match.

  • static void type_error(char *msg);

    this should be called with a message when there is a type error. it will exit with the appropriate error code necessary for automatic grading.

  • static T_scope create_scope(T_scope parent);

    this creates a new scope that points to the given parent scope.

  • static void destroy_scope(T_scope scope);

    this frees a scope once it is no longer needed

  • void insert(T_table table, string key, void *binding);

    bind a symbol name (key) to a type (binding) in a given symbol table. access the current scope's symbol table with current_scope->table.

  • void *lookup(T_table table, string key);

    retrieve the binding for a symbol name (key) in a given symbol table. if the symbol is not bound, then this function returns NULL.

  • static T_type lookup_in_all_scopes(T_scope scope, string ident);

    search for a binding of a symbol name (ident) starting from the current scope and recursively checking each parent scope until either a symbol is found until reaching the global scope. if no binding is found, this function returns NULL.

Testing

Tips

Need to keep open three files to reference simultaneously while writing the type checker

  1. The AST reference, in order to get the relevant language construct elements
  2. The Type AST reference, in order to pull out the relevant type information
  3. The type rules described in this document

    Keep this three references available to you while you implement the type checker

Every check_XXXexpr function should set expr->type

Types are represented using the AST structure for the syntax of types

If you need to create a type, e.g., for constants, use the AST create method corresponding to that type.

Author: Paul Gazzillo

Created: 2020-11-04 Wed 20:19