UP | HOME

Compiler-Compilers, SimpleC
Lecture 4

Table of Contents

Compiler architecture review

Toy compiler

  • Lexer
  • Parser
  • Code generator
  • (Diagram)

lexer.c, ast.c, parser.c, codegen.c -> compiler.exe -> compiler.exe

main thing to understand: our compiler writes out assembly code, it doesn't run the program

Compiler-compilers: automating compiler generation

  • Specify tokens and grammar
  • Compiler-compiler automatically generates lexer and parser
  • (Diagram)

(lexer.l -> lexer.c), ast.c, (parser.y -> parser.c), codegen.c -> compiler.exe -> compiler.exe

  • saw details of various parts of the compiler
  • saw how we can make the specification more specific, systematic, rigorous
  • turns out that we can use this specification to generate those parts of the compiler
  • lexer.l -> lexer.c, parser.y -> parser.c; lexer.c, parser.c, codegen.c -> compiler.exe

Lexing

  • Toy compiler's lexer.c
    • looped over each character
    • matched characters to tokens
    • created token data structure

The code handle file I/O, reads one-character-at-a-time, groups characters into tokens (number, operator, identifier, etc)

Automatically generating lexers

have you ever seen regular expressions or wildcards?

The flex tool

  • Specify pattern for each token
  • Provide code snippet for action to take on pattern

Toy compiler's lexer.l

Parsing

  • Toy compiler's parser.c
    • called lex() to get tokens one-at-a-time
    • matched syntax tree recursively
    • created AST nodes for each language construct

Automatically generating parsers

labs introduced grammars

The bison tool

  • Specify grammar of each construct
  • Provide code snippet for action to take on each construct

Toy compiler's parser.y

bison specification in detail

Revisiting Makefiles

Generate lexer and parser first

  • Build dependency graph
  • (Diagram)

Toy compiler with flex and bison

Toy compiler with a generated lexer and parser

No need to turn this on, feel free to try it out on your own.

SimpleC

  • Example program
  • Informal description
  • Formalizing the description

Example

int x;

f(a, b) : function(pointer<int>, char) -> char {
  return *a + (int) b;
}

main {
  int x;
  pointer<int> y;
  char c;
  y = &x;
  c = 't';
  return 'c';
}

Language constructs

  • Expressions
  • Statements
  • Declarations
  • Function definitions

Going to assume you know C for this. The project will help learn more about those constructs when we implement them.

Expressions

  • A subset of C
  • Arithmetic operators: + - * /
  • Pointer reference and deference: & * []
    • Recall that arrays are just pointers in C
  • Relational operators: < > <= >= =!
  • Boolean operators: && ||

Statements

  • A subset of C
  • if
  • while
  • assignment

Assignment is actually an expression in C, e.g., a = b = c;

Declarations

  • Subset of C's types
  • Different syntax for types
  • Two kinds of types
    • Primitive types: int, char
    • Compound types: pointer, array, function

IMHO SimpleC makes type declarations more explicit, less esoteric

Declaration examples

int x;
pointer<int> y;
array<5, int> z;
pointer<pointer<char>> pointer_to_string;

Function definitions

  • New syntax
  • Makes function type a little more explicit
  • Function types

    (param_type1, param_type2) -> return_type
    
  • Must have return statement at the end
    • No void functions

Function definition examples

square(a) : function(int) -> int {
  return a * a;
}

Function definition examples

multiply(a, b) : function(int, int) -> int {
  return a * b;
}

Function definition examples

increment_array(array, size, amount) : function(pointer<int>, int, int) -> pointer<int> {
  int i;
  i = 0;
  while (i < size) {
    array[i] = array[i] + amount;
  }
  return array;
}

main function

  • SimpleC has a mandatory main function
    • Differs from C, no separate compilation/linking
  • Must be at end of source file
  • No type or parameter specification
    • These will be built-in

main is part of the C runtime library. Syntactically, main is just another function definition, while the runtime ensures that it is the first thing called after initialization. See lecture 02 on how source becomes an executable.

Example revisited

int x;

f(a, b) : function(pointer<int>, char) -> char {
  return *a + (int) b;
}

main {
  int x;
  pointer<int> y;
  char c;
  y = &x;
  c = 't';
  return 'c';
}

Readings for next week

Nice talk on language design

Author: Paul Gazzillo

Created: 2020-10-01 Thu 14:43