Project 1: Creating an Abstract Syntax Tree (AST) for SimpleC
Table of Contents
Overview
In this project, we will add semantics actions to a bison parser specification file that generate an abstract syntax tree (AST) for an input SimpleC program. Provided are the lexer and parser specifications, as well as a library for creating AST nodes.
Getting started
- Accept the GitHub classroom assignment to create your git repository for the project. We will grade your project via the contents you push to the GitHub classroom repository.
- Create a local repository and add the GitHub classroom repository as a remote repository, per the instructions on GitHub or as described in lecture.
- Copy the starter files below into your local repository's folder
- Alternatively, use the
wget
command to download each file directory inside your virtual machine. - Put the Makefile and all source files in the root of the source code repository; there is no need to create a subfolder within the repository folder.
- Alternatively, use the
- Switch to your virtual machine and navigate to the local repository folder.
- Add and commit the starter files to your repository with
git
. - Run
make
in your source folder to build the project, which should create thesimplec
program.- If the build fails, double-check that you have all the starter files available in the directory.
- Run
cat example.simplec | ./simplec
- Running
./simplec
without providing input will cause the program to stop and wait for input tostdin
. - If
./simplec
is not found, be sure to prefix the program with./
and be sure themake
step above succeeded.
- Running
Starter files
- Makefile
- lexer.l
- parser.y
- ast.h
- ast.c
- ast_printer.h
- ast_printer.c
- util.h
- util.c
- main.c
- ast_example.c
- given.simplec
- example.simplec
Run these commands to download them from the command-line:
wget https://cop3402fall20.github.io/files/simplec/Makefile wget https://cop3402fall20.github.io/files/simplec/lexer.l wget https://cop3402fall20.github.io/files/simplec/parser.y wget https://cop3402fall20.github.io/files/simplec/ast.h wget https://cop3402fall20.github.io/files/simplec/ast.c wget https://cop3402fall20.github.io/files/simplec/ast_printer.h wget https://cop3402fall20.github.io/files/simplec/ast_printer.c wget https://cop3402fall20.github.io/files/simplec/util.h wget https://cop3402fall20.github.io/files/simplec/util.c wget https://cop3402fall20.github.io/files/simplec/main.c wget https://cop3402fall20.github.io/files/simplec/ast_example.c wget https://cop3402fall20.github.io/files/simplec/given.simplec wget https://cop3402fall20.github.io/files/simplec/example.simplec
Submitting your project
To facilitate easier grading and regrading, please use git tags to identify which version of the project you'd like us to use for grading.
Implementing AST generation
The main task of this project is to implement the semantic actions in parser.y
to construct AST nodes for each grammar construct. See lecture 05 for more details on this task.
To evaluate your progress, pass simplec programs to your program, e.g., cat example.simplec | ./simplec
and inspect the resulting output for correctness. Please see a complete example below.
The provided AST library
- The AST creation API is defined in
ast.h
and implemented for you inast.c
.- Use this API to construct the AST data structures
- You will not need to write code that allocates memory for tree nodes yourself
- Each AST node is loosely named after the grammar nonterminal it represents.
- For example, declarations use the
T_decl
struct
- For example, declarations use the
- Each create function is suffixed with the name of the node.
- For example,
create_decl
creates aT_decl
AST node representing a declaration construct
- For example,
- Each create function takes the child nodes of the AST node
- These correspond to the symbols in the grammar production's right-hand side
- For example,
T_decl create_decl(T_type type, string ident);
- Language constructs with multiple rules use a union type in the
struct
.- For example, statements use the
T_stmt
struct, which contains a union of types corresponding to assignment statements, if statements, etc. - The
kind
field is used to distinguish which kind of statement is held by thestruct
instance.
- For example, statements use the
Recording the resulting AST
In the program nonterminal only, save the resulting AST to the
program_ast
global variable. This will make it available to the
main function for printing. This is what the completed program
semantic action should look like. It is the only place where
program_ast
should be assigned, because program is the root of the
AST.
program : declaration_list function_list main_function { $$ = create_prog($1, $2, $3); program_ast = $$; } ;
Exit codes
According to POSIX standards, all programs report an exit code on termination. Programs use this exit code to report the status of the program on exit to the caller of the program. By convention, an exit code of 0 means a successful execution, while non-zero exit codes signal some error condition, defined on a per-program basis. This is the reason for the return 0;
at the end of main
in C programs. A C program can also use the exit
library call to terminate the program and report the given exit code.
For our compiler, we will use the exit code to report errors during compilation. For project 1, we define the two following error codes. For this project, the error reporting is already implemented for you, but you will need to use the correct error code in later phases of the compiler.
- exit(1) means there is a failed invariant check, e.g., an unexpected NULL value in the AST API
- exit(2) means there is a parser or lexer error (already implemented in parser.y)
The bash
shell stores the exit code of the last executed program in a special variable $?
, which you can print using the echo
command:
- Check the exit code by running
echo $?
right after running./simplec
.
Our automated grading scripts will use these exit codes to test your compiler, so be sure your compiler is producing the correct ones.
Test cases
The test cases repo has example program to try with your parser.
Gotchas
- Null pointers
- create functions check for NULL and provide an error
- Incorrect semantic value numbers
- Double-check the usage of
$1
values
- Double-check the usage of
- Make sure you are doing
$$ = create_...
and not just callingcreate_
otherwise, the semantic value won't be accessible by the parent - Incomplete semantic action definition
- Child nodes won't appear in AST yet
- Punctuation are also symbols in the production, so don't forget to skip them when counting the symbols for their semantic values, e.g.,
Some actions just pass a child's value as the semantic value for the parent, i.e.,
$$ = $1
.main_function : MAIN '{' declaration_list statement_list RETURN expression ';' '}' { $$ = create_main($3, $4, $6); } ;
AST API
T_prog create_prog(T_decllist decllist, T_funclist funclist, T_main main); T_exprlist create_exprlist(T_expr expr, T_exprlist tail); T_expr create_identexpr(string identexpr); T_expr create_callexpr(string ident, T_exprlist args); T_expr create_intexpr(int intexpr); T_expr create_charexpr(char charexpr); T_expr create_strexpr(string strexpr); T_expr create_arrayexpr(T_expr expr, T_expr index); T_expr create_unaryexpr(E_op op, T_expr expr); T_expr create_binaryexpr(T_expr left, E_op op, T_expr right); T_expr create_castexpr(T_type type, T_expr expr); T_stmt create_assignstmt(T_expr left, T_expr right); T_stmt create_ifstmt(T_expr cond, T_stmt body); T_stmt create_ifelsestmt(T_expr cond, T_stmt ifbranch, T_stmt elsebranch); T_stmt create_whilestmt(T_expr cond, T_stmt body); T_stmt create_compoundstmt(T_decllist decllist, T_stmtlist stmtlist); T_stmtlist create_stmtlist(T_stmt stmt, T_stmtlist tail); T_type create_primitivetype(E_typename primitivetype); T_type create_pointertype(T_type pointertype); T_type create_arraytype(int size, T_type type); T_type create_functiontype(T_typelist paramtypes, T_type returntype); T_typelist create_typelist(T_type type, T_typelist tail); T_decl create_decl(T_type type, string ident); T_decllist create_decllist(T_decl decl, T_decllist tail); T_paramlist create_paramlist(string ident, T_paramlist tail); T_func create_func(string ident, T_paramlist paramlist, T_type type, T_decllist decllist, T_stmtlist stmtlist, T_expr returnexpr); T_funclist create_funclist(T_func func, T_funclist tail); T_main create_main(T_decllist decllist, T_stmtlist stmtlist, T_expr returnexpr);
Complete example
example.simplec
int x; function(int) -> pointer<char> malloc; f(a, b) : function(pointer<int>, char) -> int { return *a + (int) b; } main { int x; pointer<int> y; pointer<int> arr; char c; arr = (pointer<int>) malloc(50); y = &x; *y = *y + f(y, '1'); c = 't'; return 0; }
cat example.simplec | ./simplec
+-prog | +-decllist | | +-decl | | | +-primitivetype | | | | +-typename_int | | | +-x | | +-decllist | | | +-decl | | | | +-functiontype | | | | | +-typelist | | | | | | +-primitivetype | | | | | | | +-typename_int | | | | | | +-typelist(empty) | | | | | +-pointertype | | | | | | +-primitivetype | | | | | | | +-typename_char | | | | +-malloc | | | +-decllist(empty) | +-funclist | | +-func | | | +-f | | | +-paramlist | | | | +-a | | | | +-paramlist | | | | | +-b | | | | | +-paramlist(empty) | | | +-functiontype | | | | +-typelist | | | | | +-pointertype | | | | | | +-primitivetype | | | | | | | +-typename_int | | | | | +-typelist | | | | | | +-primitivetype | | | | | | | +-typename_char | | | | | | +-typelist(empty) | | | | +-primitivetype | | | | | +-typename_int | | | +-decllist(empty) | | | +-stmtlist(empty) | | | +-binaryexpr | | | | +-unaryexpr | | | | | +-op_deref | | | | | +-identexpr | | | | | | +-a | | | | +-op_plus | | | | +-castexpr | | | | | +-primitivetype | | | | | | +-typename_int | | | | | +-identexpr | | | | | | +-b | | +-funclist(empty) | +-main | | +-decllist | | | +-decl | | | | +-primitivetype | | | | | +-typename_int | | | | +-x | | | +-decllist | | | | +-decl | | | | | +-pointertype | | | | | | +-primitivetype | | | | | | | +-typename_int | | | | | +-y | | | | +-decllist | | | | | +-decl | | | | | | +-pointertype | | | | | | | +-primitivetype | | | | | | | | +-typename_int | | | | | | +-arr | | | | | +-decllist | | | | | | +-decl | | | | | | | +-primitivetype | | | | | | | | +-typename_char | | | | | | | +-c | | | | | | +-decllist(empty) | | +-stmtlist | | | +-assignstmt | | | | +-identexpr | | | | | +-arr | | | | +-castexpr | | | | | +-pointertype | | | | | | +-primitivetype | | | | | | | +-typename_int | | | | | +-callexpr | | | | | | +-malloc | | | | | | +-exprlist | | | | | | | +-intexpr | | | | | | | | +-50 | | | | | | | +-exprlist(empty) | | | +-stmtlist | | | | +-assignstmt | | | | | +-identexpr | | | | | | +-y | | | | | +-unaryexpr | | | | | | +-op_ref | | | | | | +-identexpr | | | | | | | +-x | | | | +-stmtlist | | | | | +-assignstmt | | | | | | +-unaryexpr | | | | | | | +-op_deref | | | | | | | +-identexpr | | | | | | | | +-y | | | | | | +-binaryexpr | | | | | | | +-unaryexpr | | | | | | | | +-op_deref | | | | | | | | +-identexpr | | | | | | | | | +-y | | | | | | | +-op_plus | | | | | | | +-callexpr | | | | | | | | +-f | | | | | | | | +-exprlist | | | | | | | | | +-identexpr | | | | | | | | | | +-y | | | | | | | | | +-exprlist | | | | | | | | | | +-charexpr | | | | | | | | | | | +-49 | | | | | | | | | | +-exprlist(empty) | | | | | +-stmtlist | | | | | | +-assignstmt | | | | | | | +-identexpr | | | | | | | | +-c | | | | | | | +-charexpr | | | | | | | | +-116 | | | | | | +-stmtlist(empty) | | +-intexpr | | | +-0