UP | HOME

Project 3

Table of Contents

Overview

In this project, you will implement function definitions and calls, variables, and arithmetic expression. Pointers and control-flow constructs are left for the next project.

This project only needs to implement a subset of the simplec language:

  • all functions take always take one int argument and always return an int.
  • only need to support integers and pointers to integers (no arrays or characters or more than one pointer indirection)
  • bitwidths for int and pointer are both 64-bit
  • there are no local variables expected in compound statements

It is left as a challenge for bonus points to implement more features of the language.

Setup

  1. Download the template files and a couple of initial test cases.
  2. Add the new C files to your Makefile's SRC variable

    SRC := \
      lexer.c \
      parser.tab.c \
      typecheck.c \
      codegen.c \
      table.c \
      ast.c \
      ast_printer.c \
      util.c \
      main.c
    
  3. Modify main.c to include codegen.h call the type checker on the output of the parser before printing the AST.

    #include "codegen.h"
    

    In main():

    yyout = stderr;
    yyparse();
    check_prog(program_ast);
    codegen(program_ast);
    
  4. (optional) Use the precompiled parser if you haven't finished project 1.

    • Modify your Makefile to use the precompiled parser instead:
      1. Remove parser.tab.c from SRC
      2. Add parser.tab.o to the rule for simplec

        simplec: $(OBJ)
          $(CC) $(CFLAGS) -o $@ $^ parser.tab.o
        

        Recall that you need to use the tab character for indenting rules in Makefiles

    • Download parser.tab.o.project2
    • Copy it to parser.tab.o and update its timestamp, but keep the parser.tab.o.project2 file around just in case:

      cp parser.tab.o.project2 parser.tab.o
      

    WARNING: parser.tab.o has been compiled for the Vagrant VM, so it will likely not work on your host OS.

Using your compiler

# this builds your compiler
make
# this runs your compiler to compiler a simplec program
cat simple.simplec | ./simplec > simple.s
# this assembles and links the simplec program into an executable
gcc simple.s  # produces a.out.  use -o NAME to give your executable a better name
./a.out
echo $? # prints the return value of the program

Testing

Be sure to unit test as you go. The grading repository will have the tests used for grading.

Use gdb to step through your simplec output program. First, install it with

sudo apt install gdb

Clone and install this useful gdb assistant called peda. Make sure you have already compiled your simplec program as shown in "Using your compiler" above. Then step through the program like so:

gdb a.out
set disassembly-flavor att
# once inside of gdb
b main # set a breakpoint at main
run # start the program.  it will wait at main
si # step through each assembly instruction
# continue stepping through to track the behavior

If you've downloaded and installed peda, you will see the assembly code, registers, and stack displayed after each step.

Code generation rules

Function definitions codegen_func

  • create a new scope for the function
  • emit the pseudo ops for the function definition, e.g.,

    .text
    .globl FUNCNAME  # replace with actual function name here and below
    .type FUNCNAME, @function\n
    
  • emit a label for the function, which is name followed by colon:

    FUNCNAME:
    
  • insert the parameter into the offset table. for this simplified project, you can always assume there is just one parameter in the parameter list func->paramlist
  • add local declarations to the scope, by calling codegen_decllist (which you also need to implement)
  • emit the function prologue by calling emit_prologue with the right size. note that the the current_offset_scope tracks the size of the variable stack for you.
  • move the one parameter onto the stack. look it up to find the offset, then move the parameter to that offset in the stack. what register is used as the first parameter in the intel abi?
  • generate code for the body of the function. recall that it's a statement list.
  • generate code for the return expression and generate code to retrieve the value that the expression generator pushes onto the stack
  • emit the epilogue
  • destroy the function's scope

Declaration list codegen_decllist

  • loop over each element in the decllist list.
  • insert each element into the offset_scope
  • for this simplified project, we assume that all values in the stack have an offset of 8 bytes.

Assignment statement codegen_assignstmt

  • generate code for the right-hand side of the assignment and save the resulting value into a register
  • find the address of the left-hand side of the assignment.
    • the left-hand size is also an expression. for this project, we can assume that it is always an identexpr. (pointers will come in the next project).
    • lookup the identifier to find its offset (which must exist as long as the type checker is correct and codegen_func is written correctly).
    • mov the register that holds the right-hand side of the expression into the stack address
  • (for now, ignore other kinds of expression of the left-hand side)

Compound statement codegen_compoundstmt

  • for this simplified project, you can assume that there are no local variable declaration in the compound statement.
  • generate the code for the body of the compound statement and
  • if you'd like a bonus challenge, support local variables in compound statements, e.g., by using the function prologue and epilogue.

Control-flow statements are left for project 4 (codegen_ifstmt, codegen_ifelsestmt, codegen_whilestmt)

Expressions

  • overview
    • for simplicity, we will use the stack to hold intermediate values in an expression

      for instance, if we have the expression (1 + x) * (7 + y), a post-order traversal of the tree would compoute (1 + x) and (7 + y) before the multiplication. the result of these additions need to be stored until both operands are ready for multiplication.

    • to pass the result of an expression to its parent, push the result onto the stack at the end of the code for the expression.
    • when the parent expression (or statement) needs to value, it will pop the result from the stack.
  • Binary expressions
    • generate code for the left and right operands of the current expression
    • pop the result of each operand (the right one will be first, since it was pushed last)
    • compute the result of the current expression
    • push the result onto the stack
    • addition, subtraction, and multiplication

      use the ADD, SUB, and IMUL macros provided. recall that the operands are switched in the att syntax. if you have popped the left and right operands into %rax and %rbx respectively, then the add operation will be

      ADD("%rbx", "%rax")
      

      notice that the left operand, %rax is last. this operand is also the result of the expression, i.e., it computes ~rax = rax + rbx=.

    • division and remainder

      for division and remainder, the first operand must be in %rax. if the second operand is in %rbx, the assembly code is

      CDQ()
      IDIV("%rbx")
      

      then the quotient will be in %rax and the remainder will be in %rdx

    • saving the result

      don't forget to push the result onto the after generating the binary expression's code.

  • Identifiers codegen_identexpr
    • lookup the identifier's offset, move it to register, and push that value onto the stack
  • Constants codegen_intexpr
    • move the immediate value into a register and push it onto the stack
  • Function calls

    for this simplified project we can assume there is also one and only one integer argument to every function definition in the arg list expr->callexpr.args

    • generate code for the parameter's expression. recall that this will be saved onto the stack
    • pass the parameter to the callee via the %rdi register according to the intel abi
    • emit the call instruction to the function's identifier
    • save the return value by pushing it onto the stack. the return value is passed via %rax according to the intel abi.
  • Unary expressions

    unary minus is left as a bonus. the rest of the unary operators are for project 4.

Annotated examples

The following show annotated examples of what the emitted assembly looks like given the rules above:

References

Author: Paul Gazzillo

Created: 2020-11-13 Fri 18:05