Write a Compiler

Upcoming Course Dates:

  • July 13-17, 2020 (Sold Out)
  • September 14-18, 2020 (Sold Out)

Registration:

Due to COVID-19, courses are only being offered in an online format. Click here for more information.

• In-person ($2500) -- Unavailable
• Online ($1250)

Instructor: David Beazley

Includes:

  • Breakfast and lunch (in-person only)
  • Course materials

Location: 5412 N Clark Street #218, Chicago, IL 60640

Other Courses | FAQ


Overview

Shatter your brain by writing a compiler for a new programming language!

Compilers is often considered a capstone course for computer science majors. There is a reason for that--a compiler touches almost every topic of computer science ranging from theory to computer architecture. Writing a compiler is also an exercise in managing software complexity. Compilers have a lot of moving parts, involve unusual tools, are difficult to test, and are challenging to debug. You'll learn a lot by writing a compiler. Plus, you'll be able to brag about it later.

Target Audience

This class is for more experienced programmers who'd like to expand their skills by taking on a good challenge. If you're a self-taught programmer and you're curious about what writing a compiler is all about, then this is the course for you. You'll learn a lot more about programming languages and have a much better insight into the various coding tools you use every day.


"If you've seen any of David Beazley's fun and mind blowing talks, you can begin to imagine his week-long, intensive courses, held in his own Chicago office -- or "lair", as he calls it. It's even better than you think: starting with breakfast every day, David is a gracious host, a daredevil coder, and an inspiring teacher. I took his Compiler Course and I hope to attend more of his unique, hands-on workshops."

Luciano Ramalho, author of Fluent Python

Instruction Format

This course is almost entirely project focused. A typical day might consist of an hour of discussion followed by 7-8 hours of coding. During coding, there is typically considerable group discussion about coding techniques, design tradeoffs, testing, and other related topics.

Prerequisites

You might not think that you're ready to write a compiler, but if you've been coding for awhile and know the basics of data structures, it's something that you could probably tackle. No prior background in compilers is required although awareness of common programming language concepts (e.g., type systems, functions, classes, scoping rules, etc.) is strongly advised. Some knowledge of regular expressions, computer architecture (machine instructions, memory, etc.), and prior use of a compiled language is also recommended.

Syllabus

The course is structured around the goal of creating a small programming language called Wabbit. You'll write a compiler that can take Wabbit code and compile it to a native executable program via LLVM. Recent versions of the course have also directly targeted WebAssembly. The code produced by your compiler will have performance comparable to programs written in C. Along the way, you'll also implement an interpreter and a transpiler.

The project involves the following core problems:

  1. Data model. You need to figure out some way to represent a computer program, not as text, but as a proper data structure. Sometimes this is called an Abstract Syntax Tree (AST), but it's not necessarily directly tied to parsing.

  2. Parsing. You need to parse programs by converting them from text to the data model. This involves tokenizing text and understanding grammars. We'll look a bit at how parsing algorithms work and some of the tools used for parsing.

  3. Interpretation. You'll write an interpreter that can directly execute programs from the data model. It will be rather slow, but to make it it work, you'll have to understand how programs evaluate.

  4. Type checking. You'll write an static program analyzer that checks the source code for type errors and other semantic problems. We'll also discuss some things like Algebraic Type Systems.
  5. Control-flow analysis. You'll write a module for checking program control flow and finding more errors. This involves conditionals, looping, function calls, and other matters.

  6. Code generation. You'll have your compiler generate code for LLVM and/or WebAssembly. Once you have this, you'll have programs that execute at native speed comparable to C programs.

  7. Native code generation. Tools such as LLVM still hide a lot of low-level details. We'll conclude by talking about some lower-level issues such as register allocation, activation frames, function calls, linkers, and other matters.

It's important to note that a major goal is to build a stronger intuition for how all of the parts of a compiler actually work. Although there are a lot of existing frameworks and tools that can be used to assist in compiler construction, they are minimally used in this course. Instead, you will be creating a compiler from scratch from first principles.

Practical Takeaways

Although most programmers are unlikely to write a compiler in their day-to-day work, this course touches on a wide variety of practical topics that are applicable elsewhere. These include:

Are You Nuts?

Writing a compiler in only 5 days? Is it even possible? To be sure, compilers is often regarded as one of the most difficult CS courses that one can take. If you take it at a University, you'll probably get a professor who will drag you through the infamous Dragon Book, spend a lot of time doing mathematical proofs (e.g., deriving the LALR(1) parsing algorithm), and make the focus of the course on preparing graduate students for future research in programming languages. I have taught that class. This is NOT that class.

Instead, this is a compilers course aimed at practioners. As such, the main focus is on coding and software development. Yes, you will learn about some important core compiler concepts such as regular expressions, context free grammars, type systems, and programming language semantics. However, instead of doing mathematical proofs involving parsing theory, we'll focus on how you would actually go about implementing a parser and doing things such as writing unit tests for it.

To be sure, you will write a lot of code in this course. The course runs for more than 40 hours over five days. The final completed project consists of approximately 2500-3000 lines of Python and is every bit as involved as the project you typically find in a college-level compilers course for computer science majors. However, the project is structured in a way to help you succeed.

About the Instructor

This class is led by David Beazley. Although most known for his work in the Python community, Dave was formerly a tenure-track assistant professor in the Department of Computer Science at the University of Chicago where he taught a Compilers course along with a variety of other topics in systems and programming languages. Dave is also the creator of the PLY and SLY parsing tools for Python. He recently gave a PyCon talk about these tools. In a past decade, he also created Swig, a C/C++ compiler for building scripting language extension modules.