Project 3
Table of Contents
Overview
In this project, you will implement function definitions and calls, variables, and arithmetic expression. Pointers and control-flow constructs are left for the next project.
This project only needs to implement a subset of the simplec language:
- all functions take always take one int argument and always return an int.
- only need to support integers and pointers to integers (no arrays or characters or more than one pointer indirection)
- bitwidths for int and pointer are both 64-bit
- there are no local variables expected in compound statements
It is left as a challenge for bonus points to implement more features of the language.
Setup
- Download the template files and a couple of initial test cases.
Add the new C files to your Makefile's
SRC
variableSRC := \ lexer.c \ parser.tab.c \ typecheck.c \ codegen.c \ table.c \ ast.c \ ast_printer.c \ util.c \ main.c
Modify
main.c
to includecodegen.h
call the type checker on the output of the parser before printing the AST.#include "codegen.h"
In
main()
:yyout = stderr; yyparse(); check_prog(program_ast); codegen(program_ast);
(optional) Use the precompiled parser if you haven't finished project 1.
- Modify your Makefile to use the precompiled parser instead:
- Remove
parser.tab.c
fromSRC
Add
parser.tab.o
to the rule forsimplec
simplec: $(OBJ) $(CC) $(CFLAGS) -o $@ $^ parser.tab.o
Recall that you need to use the
tab
character for indenting rules in Makefiles
- Remove
- Download parser.tab.o.project2
Copy it to
parser.tab.o
and update its timestamp, but keep the parser.tab.o.project2 file around just in case:cp parser.tab.o.project2 parser.tab.o
WARNING:
parser.tab.o
has been compiled for the Vagrant VM, so it will likely not work on your host OS.- Modify your Makefile to use the precompiled parser instead:
Using your compiler
# this builds your compiler make # this runs your compiler to compiler a simplec program cat simple.simplec | ./simplec > simple.s # this assembles and links the simplec program into an executable gcc simple.s # produces a.out. use -o NAME to give your executable a better name ./a.out echo $? # prints the return value of the program
Testing
Be sure to unit test as you go. The grading repository will have the tests used for grading.
Use gdb to step through your simplec output program. First, install it with
sudo apt install gdb
Clone and install this useful gdb assistant called peda. Make sure you have already compiled your simplec program as shown in "Using your compiler" above. Then step through the program like so:
gdb a.out set disassembly-flavor att # once inside of gdb b main # set a breakpoint at main run # start the program. it will wait at main si # step through each assembly instruction # continue stepping through to track the behavior
If you've downloaded and installed peda, you will see the assembly code, registers, and stack displayed after each step.
Code generation rules
Function definitions codegen_func
- create a new scope for the function
emit the pseudo ops for the function definition, e.g.,
.text .globl FUNCNAME # replace with actual function name here and below .type FUNCNAME, @function\n
emit a label for the function, which is name followed by colon:
FUNCNAME:
- insert the parameter into the offset table. for this simplified project, you can always assume there is just one parameter in the parameter list
func->paramlist
- add local declarations to the scope, by calling codegen_decllist (which you also need to implement)
- emit the function prologue by calling emit_prologue with the right size. note that the the current_offset_scope tracks the size of the variable stack for you.
- move the one parameter onto the stack. look it up to find the offset, then move the parameter to that offset in the stack. what register is used as the first parameter in the intel abi?
- generate code for the body of the function. recall that it's a statement list.
- generate code for the return expression and generate code to retrieve the value that the expression generator pushes onto the stack
- emit the epilogue
- destroy the function's scope
Declaration list codegen_decllist
- loop over each element in the decllist list.
- insert each element into the
offset_scope
- for this simplified project, we assume that all values in the stack have an offset of 8 bytes.
Assignment statement codegen_assignstmt
- generate code for the right-hand side of the assignment and save the resulting value into a register
- find the address of the left-hand side of the assignment.
- the left-hand size is also an expression. for this project, we can assume that it is always an identexpr. (pointers will come in the next project).
- lookup the identifier to find its offset (which must exist as long as the type checker is correct and codegen_func is written correctly).
- mov the register that holds the right-hand side of the expression into the stack address
- (for now, ignore other kinds of expression of the left-hand side)
Compound statement codegen_compoundstmt
- for this simplified project, you can assume that there are no local variable declaration in the compound statement.
- generate the code for the body of the compound statement and
- if you'd like a bonus challenge, support local variables in compound statements, e.g., by using the function prologue and epilogue.
Control-flow statements are left for project 4 (codegen_ifstmt
, codegen_ifelsestmt
, codegen_whilestmt
)
Expressions
- overview
for simplicity, we will use the stack to hold intermediate values in an expression
for instance, if we have the expression (1 + x) * (7 + y), a post-order traversal of the tree would compoute (1 + x) and (7 + y) before the multiplication. the result of these additions need to be stored until both operands are ready for multiplication.
- to pass the result of an expression to its parent, push the result onto the stack at the end of the code for the expression.
- when the parent expression (or statement) needs to value, it will pop the result from the stack.
- Binary expressions
- generate code for the left and right operands of the current expression
- pop the result of each operand (the right one will be first, since it was pushed last)
- compute the result of the current expression
- push the result onto the stack
- addition, subtraction, and multiplication
use the ADD, SUB, and IMUL macros provided. recall that the operands are switched in the att syntax. if you have popped the left and right operands into
%rax
and%rbx
respectively, then the add operation will beADD("%rbx", "%rax")
notice that the left operand,
%rax
is last. this operand is also the result of the expression, i.e., it computes ~rax = rax + rbx=. - division and remainder
for division and remainder, the first operand must be in
%rax
. if the second operand is in%rbx
, the assembly code isCDQ() IDIV("%rbx")
then the quotient will be in
%rax
and the remainder will be in%rdx
- saving the result
don't forget to push the result onto the after generating the binary expression's code.
- Identifiers
codegen_identexpr
- lookup the identifier's offset, move it to register, and push that value onto the stack
- Constants
codegen_intexpr
- move the immediate value into a register and push it onto the stack
- Function calls
for this simplified project we can assume there is also one and only one integer argument to every function definition in the arg list
expr->callexpr.args
- generate code for the parameter's expression. recall that this will be saved onto the stack
- pass the parameter to the callee via the
%rdi
register according to the intel abi - emit the call instruction to the function's identifier
- save the return value by pushing it onto the stack. the return value is passed via
%rax
according to the intel abi.
- Unary expressions
unary minus is left as a bonus. the rest of the unary operators are for project 4.