UP | HOME

Project 1: Creating an Abstract Syntax Tree (AST) for SimpleC

Table of Contents

Overview

In this project, we will add semantics actions to a bison parser specification file that generate an abstract syntax tree (AST) for an input SimpleC program. Provided are the lexer and parser specifications, as well as a library for creating AST nodes.

Getting started

  • Accept the GitHub classroom assignment to create your git repository for the project. We will grade your project via the contents you push to the GitHub classroom repository.
  • Create a local repository and add the GitHub classroom repository as a remote repository, per the instructions on GitHub or as described in lecture.
  • Copy the starter files below into your local repository's folder
    • Alternatively, use the wget command to download each file directory inside your virtual machine.
    • Put the Makefile and all source files in the root of the source code repository; there is no need to create a subfolder within the repository folder.
  • Switch to your virtual machine and navigate to the local repository folder.
  • Add and commit the starter files to your repository with git.
  • Run make in your source folder to build the project, which should create the simplec program.
    • If the build fails, double-check that you have all the starter files available in the directory.
  • Run cat example.simplec | ./simplec
    • Running ./simplec without providing input will cause the program to stop and wait for input to stdin.
    • If ./simplec is not found, be sure to prefix the program with ./ and be sure the make step above succeeded.

Starter files

Run these commands to download them from the command-line:

wget https://cop3402fall20.github.io/files/simplec/Makefile
wget https://cop3402fall20.github.io/files/simplec/lexer.l
wget https://cop3402fall20.github.io/files/simplec/parser.y
wget https://cop3402fall20.github.io/files/simplec/ast.h
wget https://cop3402fall20.github.io/files/simplec/ast.c
wget https://cop3402fall20.github.io/files/simplec/ast_printer.h
wget https://cop3402fall20.github.io/files/simplec/ast_printer.c
wget https://cop3402fall20.github.io/files/simplec/util.h
wget https://cop3402fall20.github.io/files/simplec/util.c
wget https://cop3402fall20.github.io/files/simplec/main.c
wget https://cop3402fall20.github.io/files/simplec/ast_example.c
wget https://cop3402fall20.github.io/files/simplec/given.simplec
wget https://cop3402fall20.github.io/files/simplec/example.simplec

Submitting your project

To facilitate easier grading and regrading, please use git tags to identify which version of the project you'd like us to use for grading.

Implementing AST generation

The main task of this project is to implement the semantic actions in parser.y to construct AST nodes for each grammar construct. See lecture 05 for more details on this task.

To evaluate your progress, pass simplec programs to your program, e.g., cat example.simplec | ./simplec and inspect the resulting output for correctness. Please see a complete example below.

The provided AST library

  • The AST creation API is defined in ast.h and implemented for you in ast.c.
    • Use this API to construct the AST data structures
    • You will not need to write code that allocates memory for tree nodes yourself
  • Each AST node is loosely named after the grammar nonterminal it represents.
    • For example, declarations use the T_decl struct
  • Each create function is suffixed with the name of the node.
    • For example, create_decl creates a T_decl AST node representing a declaration construct
  • Each create function takes the child nodes of the AST node
    • These correspond to the symbols in the grammar production's right-hand side
    • For example, T_decl create_decl(T_type type, string ident);
  • Language constructs with multiple rules use a union type in the struct.
    • For example, statements use the T_stmt struct, which contains a union of types corresponding to assignment statements, if statements, etc.
    • The kind field is used to distinguish which kind of statement is held by the struct instance.

See the complete definition in ast.h or below.

Recording the resulting AST

In the program nonterminal only, save the resulting AST to the program_ast global variable. This will make it available to the main function for printing. This is what the completed program semantic action should look like. It is the only place where program_ast should be assigned, because program is the root of the AST.

program
  : declaration_list function_list main_function
  {
    $$ = create_prog($1, $2, $3);
    program_ast = $$;
  }
  ;

enums for types and operators

Lists

describe empty list vs non-empty list differences

Exit codes

According to POSIX standards, all programs report an exit code on termination. Programs use this exit code to report the status of the program on exit to the caller of the program. By convention, an exit code of 0 means a successful execution, while non-zero exit codes signal some error condition, defined on a per-program basis. This is the reason for the return 0; at the end of main in C programs. A C program can also use the exit library call to terminate the program and report the given exit code.

For our compiler, we will use the exit code to report errors during compilation. For project 1, we define the two following error codes. For this project, the error reporting is already implemented for you, but you will need to use the correct error code in later phases of the compiler.

  • exit(1) means there is a failed invariant check, e.g., an unexpected NULL value in the AST API
  • exit(2) means there is a parser or lexer error (already implemented in parser.y)

The bash shell stores the exit code of the last executed program in a special variable $?, which you can print using the echo command:

  • Check the exit code by running echo $? right after running ./simplec.

Our automated grading scripts will use these exit codes to test your compiler, so be sure your compiler is producing the correct ones.

Test cases

The test cases repo has example program to try with your parser.

Gotchas

  • Null pointers
    • create functions check for NULL and provide an error
  • Incorrect semantic value numbers
    • Double-check the usage of $1 values
  • Make sure you are doing $$ = create_... and not just calling create_ otherwise, the semantic value won't be accessible by the parent
  • Incomplete semantic action definition
    • Child nodes won't appear in AST yet
  • Punctuation are also symbols in the production, so don't forget to skip them when counting the symbols for their semantic values, e.g.,
  • Some actions just pass a child's value as the semantic value for the parent, i.e., $$ = $1.

    main_function
     : MAIN '{' declaration_list statement_list RETURN expression ';' '}'
     {
       $$ = create_main($3, $4, $6);
     }
     ;
    

AST API

T_prog create_prog(T_decllist decllist, T_funclist funclist, T_main main);
T_exprlist create_exprlist(T_expr expr, T_exprlist tail);
T_expr create_identexpr(string identexpr);
T_expr create_callexpr(string ident, T_exprlist args);
T_expr create_intexpr(int intexpr);
T_expr create_charexpr(char charexpr);
T_expr create_strexpr(string strexpr);
T_expr create_arrayexpr(T_expr expr, T_expr index);
T_expr create_unaryexpr(E_op op, T_expr expr);
T_expr create_binaryexpr(T_expr left, E_op op, T_expr right);
T_expr create_castexpr(T_type type, T_expr expr);
T_stmt create_assignstmt(T_expr left, T_expr right);
T_stmt create_ifstmt(T_expr cond, T_stmt body);
T_stmt create_ifelsestmt(T_expr cond, T_stmt ifbranch, T_stmt elsebranch);
T_stmt create_whilestmt(T_expr cond, T_stmt body);
T_stmt create_compoundstmt(T_decllist decllist, T_stmtlist stmtlist);
T_stmtlist create_stmtlist(T_stmt stmt, T_stmtlist tail);
T_type create_primitivetype(E_typename primitivetype);
T_type create_pointertype(T_type pointertype);
T_type create_arraytype(int size, T_type type);
T_type create_functiontype(T_typelist paramtypes, T_type returntype);
T_typelist create_typelist(T_type type, T_typelist tail);
T_decl create_decl(T_type type, string ident);
T_decllist create_decllist(T_decl decl, T_decllist tail);
T_paramlist create_paramlist(string ident, T_paramlist tail);
T_func create_func(string ident, T_paramlist paramlist, T_type type, T_decllist decllist, T_stmtlist stmtlist, T_expr returnexpr);
T_funclist create_funclist(T_func func, T_funclist tail);
T_main create_main(T_decllist decllist, T_stmtlist stmtlist, T_expr returnexpr);

Complete example

example.simplec

int x;
function(int) -> pointer<char> malloc;

f(a, b) : function(pointer<int>, char) -> int {
  return *a + (int) b;
}

main {
  int x;
  pointer<int> y;
  pointer<int> arr;
  char c;
  arr = (pointer<int>) malloc(50);
  y = &x;
  *y = *y + f(y, '1');
  c = 't';
  return 0;
}

cat example.simplec | ./simplec

+-prog
|  +-decllist
|  |  +-decl
|  |  |  +-primitivetype
|  |  |  |  +-typename_int
|  |  |  +-x
|  |  +-decllist
|  |  |  +-decl
|  |  |  |  +-functiontype
|  |  |  |  |  +-typelist
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  |  +-typelist(empty)
|  |  |  |  |  +-pointertype
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_char
|  |  |  |  +-malloc
|  |  |  +-decllist(empty)
|  +-funclist
|  |  +-func
|  |  |  +-f
|  |  |  +-paramlist
|  |  |  |  +-a
|  |  |  |  +-paramlist
|  |  |  |  |  +-b
|  |  |  |  |  +-paramlist(empty)
|  |  |  +-functiontype
|  |  |  |  +-typelist
|  |  |  |  |  +-pointertype
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-typelist
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_char
|  |  |  |  |  |  +-typelist(empty)
|  |  |  |  +-primitivetype
|  |  |  |  |  +-typename_int
|  |  |  +-decllist(empty)
|  |  |  +-stmtlist(empty)
|  |  |  +-binaryexpr
|  |  |  |  +-unaryexpr
|  |  |  |  |  +-op_deref
|  |  |  |  |  +-identexpr
|  |  |  |  |  |  +-a
|  |  |  |  +-op_plus
|  |  |  |  +-castexpr
|  |  |  |  |  +-primitivetype
|  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-identexpr
|  |  |  |  |  |  +-b
|  |  +-funclist(empty)
|  +-main
|  |  +-decllist
|  |  |  +-decl
|  |  |  |  +-primitivetype
|  |  |  |  |  +-typename_int
|  |  |  |  +-x
|  |  |  +-decllist
|  |  |  |  +-decl
|  |  |  |  |  +-pointertype
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-y
|  |  |  |  +-decllist
|  |  |  |  |  +-decl
|  |  |  |  |  |  +-pointertype
|  |  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  |  +-arr
|  |  |  |  |  +-decllist
|  |  |  |  |  |  +-decl
|  |  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  |  +-typename_char
|  |  |  |  |  |  |  +-c
|  |  |  |  |  |  +-decllist(empty)
|  |  +-stmtlist
|  |  |  +-assignstmt
|  |  |  |  +-identexpr
|  |  |  |  |  +-arr
|  |  |  |  +-castexpr
|  |  |  |  |  +-pointertype
|  |  |  |  |  |  +-primitivetype
|  |  |  |  |  |  |  +-typename_int
|  |  |  |  |  +-callexpr
|  |  |  |  |  |  +-malloc
|  |  |  |  |  |  +-exprlist
|  |  |  |  |  |  |  +-intexpr
|  |  |  |  |  |  |  |  +-50
|  |  |  |  |  |  |  +-exprlist(empty)
|  |  |  +-stmtlist
|  |  |  |  +-assignstmt
|  |  |  |  |  +-identexpr
|  |  |  |  |  |  +-y
|  |  |  |  |  +-unaryexpr
|  |  |  |  |  |  +-op_ref
|  |  |  |  |  |  +-identexpr
|  |  |  |  |  |  |  +-x
|  |  |  |  +-stmtlist
|  |  |  |  |  +-assignstmt
|  |  |  |  |  |  +-unaryexpr
|  |  |  |  |  |  |  +-op_deref
|  |  |  |  |  |  |  +-identexpr
|  |  |  |  |  |  |  |  +-y
|  |  |  |  |  |  +-binaryexpr
|  |  |  |  |  |  |  +-unaryexpr
|  |  |  |  |  |  |  |  +-op_deref
|  |  |  |  |  |  |  |  +-identexpr
|  |  |  |  |  |  |  |  |  +-y
|  |  |  |  |  |  |  +-op_plus
|  |  |  |  |  |  |  +-callexpr
|  |  |  |  |  |  |  |  +-f
|  |  |  |  |  |  |  |  +-exprlist
|  |  |  |  |  |  |  |  |  +-identexpr
|  |  |  |  |  |  |  |  |  |  +-y
|  |  |  |  |  |  |  |  |  +-exprlist
|  |  |  |  |  |  |  |  |  |  +-charexpr
|  |  |  |  |  |  |  |  |  |  |  +-49
|  |  |  |  |  |  |  |  |  |  +-exprlist(empty)
|  |  |  |  |  +-stmtlist
|  |  |  |  |  |  +-assignstmt
|  |  |  |  |  |  |  +-identexpr
|  |  |  |  |  |  |  |  +-c
|  |  |  |  |  |  |  +-charexpr
|  |  |  |  |  |  |  |  +-116
|  |  |  |  |  |  +-stmtlist(empty)
|  |  +-intexpr
|  |  |  +-0

Author: Paul Gazzillo

Created: 2020-11-04 Wed 20:19