Regular expressions and finite state automata
Lecture 6
Table of Contents
Regular expressions
String pattern matching
- phone numbers
- email addresses
- programming language tokens
How would you write this program?
- Search for an email address in a text file?
Example
Sequence
- Area code is three digits
Alternation
- A digit is either a 0, 1, 2, .., 9
Parentheses are just used to make order of operations explicit, just like in arithmetic
Repetition
- Any number of characters before the @ sign
Optional elements
- E.g., country code
Wild cards
- Allow any character (or some subset of characters)
Regular expression language
- concatenation, e.g.,
ab
- alternation, e.g.,
a|b
- Kleene closure, e.g.,
a*
The order of operations, from highest to lowest, is Kleene closure, concatenation, and alternation.
One way to remember order of operations, is that alternation is like addition or logical or, concatenation is like multiplication or logical and, and Kleene closure is like exponentiation.
Regular expressions in practice
- Lots of syntactic sugar available
- python library (perl syntax)
Finite state automata
- AKA
- Finite state machines
- Finite automaton
- State machine
Pattern matching equivalent to many automation tasks
- String pattern matching problem
- Capture each possible string prefix in a state
First abstract machine model
- Formal language: potentially infinite set of strings
- Each string drawn from a finite alphabet
- Each string element itself is finite
Here's our first abstract machine model
You'll see more in discrete 2
Other state machine applications
- Traffic lights
- Turnstiles
- Vending machines
Any machine with some predefined set of states and events that transition between states
Implementing finite state automata
- Graph
- If and while
- Table-based (diagram)
Automatically generating automata from regular expressions
This is how the flex tools works under the hood.
Wrap-up
- Regular expressions capture string pattern matching
- Regular expressions correspond to finite state automata
- Automata can be implemented in software
- Regular expression can be automatically converted to automata
Readings
- Dragon book, chapter 3