Write a Compiler

Upcoming Course Dates:

Course Hours:

Cost:

  • $1500 (Corporate)
  • $750 (Individual)

Registration (no payment required now):

Instructor: David Beazley

Other Courses | FAQ


Overview

Admiral Grace Hopper implemented the first compiler in 1952. Now it's your turn. In this course, you'll write a compiler for a new programming language from scratch. At first glance, this might not seem like the most practical thing to be doing, but trust me, your eyes will light up whenever you tell people that you wrote a compiler. Plus, given that implementing a compiler is an exercise is managing complexity, it often proves to be useful in ways that aren't always obvious.

A true story: I once had a former student stop by my office to tell me how he helped save a major coding disaster involving health insurance. I asked "how'd you do that?" He said, "I used something I learned while writing a compiler."

Target Audience

This class is for programmers who aspire to improve their skills with software design, data handling, type systems, and related topics. Not many programmers get the opportunity to write a compiler unless they happen to take such a course as a CS undergraduate or they enroll in graduate school. Thus, this course could be a good way to challenge yourself.


"If you've seen any of David Beazley's fun and mind blowing talks, you can begin to imagine his week-long, intensive courses, held in his own Chicago office -- or "lair", as he calls it. It's even better than you think: starting with breakfast every day, David is a gracious host, a daredevil coder, and an inspiring teacher. I took his Compiler Course and I hope to attend more of his unique, hands-on workshops."

Luciano Ramalho, author of Fluent Python

Instruction Format

This course is entirely project focused and presented in a conversational live-coding style without PowerPoint style slides. The goal of the course is not just to learn how to write a compiler, but also how to approach the problem of writing a compiler from first principles. Part of the course involves group discussion about problem decomposition, coding techniques, design tradeoffs, testing, and other related topics. The rest of the time is spent working on individual coding.

Coding examples are primarily provided in Python. However, the project involves no third-party libraries, esoteric Python-specific features, or Python-dependent tooling. As such, you're free to implement the project in any programming language that you wish.

Note: Athough this course does not mandate the use of any particular programming environment, AI-assisted coding tools are NOT used in the presentation.

Prerequisites

You might not think that you're ready to write a compiler, but if you've been coding for awhile and know the basics of data structures, it's something that you could probably tackle. No prior background in compilers is required although awareness of common programming language concepts (e.g., types, functions, classes, scoping rules, etc.) is strongly advised. Some knowledge of text manipulation, computer architecture (machine instructions, memory, etc.), and prior use of a compiled language is also recommended. Robert Nystrom's Crafting Interpreters book is also an excellent source of background information.

Project Overview

The course is structured around the problem of implementing a small programming language called Wabbit. Wabbit is a small, statically typed, imperative language. You'll write a compiler that can take Wabbit code and compile it to a native executable program via LLVM. In doing this, the goal is to create something that is reasonably well-organized, makes good use of abstractions, and can be tested.

There are different problems that need to be addressed:

  1. Data model. Computer programs need to be represented as a proper data structure, not as raw text. This structure is typically called an Abstract Syntax Tree (AST). Defining it requires a deeper understanding of how programming language features are put together.
  2. Program Transformation. Compilers work by transforming programs into equivalent programs. These transformations can take many different forms. For example, high-level programming language features can often be transformed to simpler features that already exist. Programs can also be transformed to make behavior explicit or for the purposes of optimization.
  3. Parsing. You need to parse programs by converting them from text to the data model. This involves tokenizing text and understanding grammars. You'll write a recursive descent parser from scratch. Although it's a critical part of make a programming language, knowing how to write a parser is a useful skill that can be applied to all manner of data handling problems.
  4. Virtual machines. Many programming languages work by compiling code to instructions that target a virtual machine. For example, Python, Java, and C# all work in this way. We will explore this by defining our own stack-based virtual machine and having the compiler produce code for it.
  5. Lower-level code generation. We'll see how to convert virtual machine code to a lower-level target. Normally, this could be something like x86 or ARM assembler. However, rather than getting bogged down in those details, we'll target LLVM. This is still higher-level than actual hardware, but it's low-level enough to understand the nature of the problem.
  6. Type checking (the great refactoring). Initially, our compiler will only support a single datatype (integers). However, we'll conclude by expanding the compiler to support different types. Naturally, this will be introduced as a refactoring problem since the only thing harder than writing a compiler is refactoring a compiler. Since type systems are often pitched as a tool for "fearless refactoring", the problem itself will become a backdrop for talking more about types.

At the end of the project, the compiler is broken into about 15 distinct stages, comprising approximately 2000-3000 total lines of code. Much of the underlying complexity is not in the volume of code, but in how the different stages of the compiler are put together.

Are You Nuts?

Writing a compiler in only 5 days? Is it even possible? To be sure, compilers is often regarded as one of the most difficult CS courses that one can take. If you take it at a University, you might get a professor who will drag you through the infamous Dragon Book, spend a lot of time doing mathematical proofs (e.g., deriving the LALR(1) parsing algorithm), and make the focus of the course on preparing graduate students for future research in programming languages. That can be a great class. I have taught that class. This is NOT that class.

Instead, this is a compilers course aimed at working programmers. As such, the main focus is on programming. Yes, you will learn about some important core concepts that underpin compilers. However, instead of doing mathematical proofs involving parsing theory, we'll focus on how you would actually go about implementing a parser and doing things such as writing unit tests for it.

About the Instructor

This class is led by David Beazley. Although most known for his work in the Python community, Dave was formerly a tenure-track assistant professor in the Department of Computer Science at the University of Chicago where he taught a Compilers course along with a variety of other topics in systems and programming languages. In 2023 and 2024, he taught the Programming Language Design and Implementation course with Shriram Krishnamurthi at Brown University. Dave is also the creator of the PLY and SLY parsing tools for Python. He once gave a PyCon talk about these tools. In a previous decade, he also created Swig, a C/C++ compiler for building scripting language extension modules.


Copyright (C) 2005-2026, David Beazley