Compiler-Compilers, SimpleC
Lecture 4
Table of Contents
Compiler architecture review
Toy compiler
- Lexer
- Parser
- Code generator
- (Diagram)
lexer.c, ast.c, parser.c, codegen.c -> compiler.exe -> compiler.exe
main thing to understand: our compiler writes out assembly code, it doesn't run the program
Compiler-compilers: automating compiler generation
- Specify tokens and grammar
- Compiler-compiler automatically generates lexer and parser
- (Diagram)
(lexer.l -> lexer.c), ast.c, (parser.y -> parser.c), codegen.c -> compiler.exe -> compiler.exe
- saw details of various parts of the compiler
- saw how we can make the specification more specific, systematic, rigorous
- turns out that we can use this specification to generate those parts of the compiler
- lexer.l -> lexer.c, parser.y -> parser.c; lexer.c, parser.c, codegen.c -> compiler.exe
Lexing
- Toy compiler's lexer.c
- looped over each character
- matched characters to tokens
- created token data structure
The code handle file I/O, reads one-character-at-a-time, groups characters into tokens (number, operator, identifier, etc)
Automatically generating lexers
- Instead of writing by hand
- Specify pattern for each token
- Generate C code from specification automatically
- http://dinosaur.compilertools.net/#flex
have you ever seen regular expressions or wildcards?
The flex
tool
- Specify pattern for each token
- Provide code snippet for action to take on pattern
Toy compiler's lexer.l
flex specification in detail
Parsing
- Toy compiler's parser.c
- called lex() to get tokens one-at-a-time
- matched syntax tree recursively
- created AST nodes for each language construct
Automatically generating parsers
- Instead of writing by hand
- Specify language grammar
- Generate C code automatically
- http://dinosaur.compilertools.net/#bison
labs introduced grammars
The bison
tool
- Specify grammar of each construct
- Provide code snippet for action to take on each construct
Toy compiler's parser.y
bison specification in detail
Interface between flex and bison
Revisiting Makefiles
- https://www.gnu.org/software/make/manual/make.html#Introduction
- How Makefiles work
- Find dependency graph
- Execute each target in order
- Separate compilation
- Compile each software component separately
- Linker matches function calls with definitions
objdump -t file.o
Generate lexer and parser first
- Build dependency graph
- (Diagram)
Toy compiler with flex and bison
Toy compiler with a generated lexer and parser
No need to turn this on, feel free to try it out on your own.
SimpleC
- Example program
- Informal description
- Formalizing the description
Example
int x; f(a, b) : function(pointer<int>, char) -> char { return *a + (int) b; } main { int x; pointer<int> y; char c; y = &x; c = 't'; return 'c'; }
Language constructs
- Expressions
- Statements
- Declarations
- Function definitions
Going to assume you know C for this. The project will help learn more about those constructs when we implement them.
Expressions
- A subset of C
- Arithmetic operators: + - * /
- Pointer reference and deference: & * []
- Recall that arrays are just pointers in C
- Relational operators: < > <= >=
=!
- Boolean operators: && ||
Statements
- A subset of C
- if
- while
- assignment
Assignment is actually an expression in C, e.g., a = b = c;
Declarations
- Subset of C's types
- Different syntax for types
- Two kinds of types
- Primitive types: int, char
- Compound types: pointer, array, function
IMHO SimpleC makes type declarations more explicit, less esoteric
Declaration examples
int x; pointer<int> y; array<5, int> z; pointer<pointer<char>> pointer_to_string;
Function definitions
- New syntax
- Makes function type a little more explicit
Function types
(param_type1, param_type2) -> return_type
- Must have return statement at the end
- No void functions
Function definition examples
square(a) : function(int) -> int { return a * a; }
Function definition examples
multiply(a, b) : function(int, int) -> int { return a * b; }
Function definition examples
increment_array(array, size, amount) : function(pointer<int>, int, int) -> pointer<int> { int i; i = 0; while (i < size) { array[i] = array[i] + amount; } return array; }
main function
- SimpleC has a mandatory main function
- Differs from C, no separate compilation/linking
- Must be at end of source file
- No type or parameter specification
- These will be built-in
main is part of the C runtime library. Syntactically, main is just another function definition, while the runtime ensures that it is the first thing called after initialization. See lecture 02 on how source becomes an executable.
Example revisited
int x; f(a, b) : function(pointer<int>, char) -> char { return *a + (int) b; } main { int x; pointer<int> y; char c; y = &x; c = 't'; return 'c'; }