Compiler Architecture Explained
The Glu compiler is a multi-stage compiler that compiles Glu source code into an executable file. The compiler is written in C++ and uses the LLVM compiler infrastructure to generate machine code. The compiler consists of several stages, each of which performs a specific task in the compilation process.
The particularity of the Glu compiler is that it is designed to be reversible. This means that the compiler can generate Glu source code from the LLVM IR representation of a program. This feature allows the compiler to perform optimizations on the LLVM IR representation of a program and then generate Glu source code from the optimized IR. This can be useful for debugging and understanding the behavior of the compiler. Most importantly, it allows the compiler to generate human-readable Glu source code from the optimized IR generated by any LLVM-based compiler, such as Clang, Rust, or Swift.
Compiler Stages
The Glu compiler has multiple stages, each of which performs a specific task in the compilation process. Each stage also has a corresponding reverse stage that can go backwards in the compilation process. The stages are shown on this diagram:
The code goes through the following representations during the compilation process:
- Glu Source Code: The original source code written in the Glu programming language (
.glufiles). - AST: The abstract syntax tree representation of the program.
- GIL: The Glu Intermediate Language, an intermediate representation of the program that is used for high-level optimization and analysis (
.gilfiles). - LLVM IR: The intermediate representation of the program in the LLVM compiler infrastructure (
.llfiles for textual representation,.bcfiles for binary representation). - MIR: The LLVM Machine IR, the final representation of the program that is used to generate machine code.
- Object File: The compiled machine code in an object file format (
.oon Unix-like systems,.objon Windows).
The Glu compilation process consists of the following stages:
- ASTGen:
Glu Source Code -> AST: The ASTGen stage parses the Glu source code and generates an abstract syntax tree (AST) representation of the program. - GILGen:
AST -> GIL: The GILGen stage lowers the AST representation of the program to the Glu Intermediate Language (GIL) representation. - GILOptimizer:
GIL -> GIL: The GIL representation of the program is optimized using high-level optimizations. - IRGen:
GIL -> LLVM IR: The IRGen stage generates the LLVM IR representation of the program from the GIL representation.
The LLVM infrastructure is used to optimize and generate machine code from the LLVM IR representation of the program, leveraging the powerful optimization passes and code generation capabilities of LLVM.
The Glu decompilation process is similar to the compilation process but in reverse:
- IRDec:
LLVM IR -> AST: The IRDec stage generates a GIL representation of the program from the LLVM IR representation. - ASTPrinter:
AST -> Glu Source Code: The AST Code Printer stage pretty prints Glu source code from the AST representation of the program.
The Glu compiler is designed to be modular and extensible, allowing developers to easily add optimization passes to the compilation process. Each stage implemented as a separate C++ library: This design allows developers to experiment with new optimizations and transformations without recompiling the entire compiler.
Debug Information (Planned)
The Glu compiler keeps track of debug information throughout the compilation process to provide accurate source-level debugging information in the generated machine code. The compiler generates debug information in the LLVM IR representation of the program, which is used by the LLVM infrastructure to generate debug information in the final machine code. Debug information includes source file names, line numbers, and variable names, allowing developers to debug their programs using debuggers such as LLDB.
We believe that debugging is important, both without and with optimizations. Therefore, the Glu compiler generates debug information by default, even when optimizations are enabled. This allows developers to debug optimized code as accurately as possible.
It is also possible to generate debug information for the Glu Intermediate Language (GIL) representation of the program, using -ggil. In that case, temporary GIL files are generated, and the debuggers will step through those files instead of the source files. This can be useful when debugging high-level optimizations that are performed on the GIL representation of the program.
It is also possible to generate debug information for decompiled Glu source code, using -gdec, which can be a better experience when debugging optimized code. The compiler will go through all compilation stages, then decompile the optimized LLVM IR back to Glu source code, and generate debug information referencing the decompiled source code. This allows stepping through the decompiled source code in the debugger, providing a more natural debugging experience, though the code may not be exactly the same as the original source code.
Finally, it is possible to disable debug information generation using -g0, which can reduce the size of the generated object files and make the compilation process faster. However, this is not recommended for development, as it makes debugging more difficult.
Debug information such as variables and their types is very important in the decompilation process, as it allows the compiler to generate better Glu source code from the optimized LLVM IR representation of the program. Without debug information, the decompiled code would be less readable and harder to understand. Therefore, when importing LLVM bitcode generated by other compilers, it is recommended to enable generation of debug information in those compilers, to improve the quality of the decompiled code.
Example Compilation
To compile a Glu program, you can use the gluc compiler, which is the front-end of the Glu compiler. The gluc compiler takes Glu source code as input and generates the requested output, such as LLVM bitcode or an executable file.
We will use the following Glu program as an example:
1
2
3
4
func main() {
let message: String = "Hello, World!";
std::print(message);
}
To view the first stage of the compilation process, you can generate the AST representation of the program using the -print-ast flag:
1
gluc -print-ast main.glu
This will output the AST representation of the program in a Human-readable format. You can inspect the AST to understand how the Glu compiler represents the program internally.
You can then look at the GIL representation of the program using the -print-gil flag. Use the -o flag to specify the output file, or leave it out to print to standard output:
1
gluc -print-gil main.glu
This will output the GIL representation of the program in a textual format. You can inspect the GIL to understand how the Glu compiler lowers the AST representation of the program to the GIL representation. By default, colors are used to highlight different parts of the GIL representation when printed to a terminal that supports colors. You can disable colors using the --color=0 flag.
It should look like this:
1
2
3
4
5
6
7
8
9
10
11
gil @main : $() -> Int32 {
entry:
%0 = alloca $String, loc "main.glu":2:9
debug %0 : $*String, let "message", loc "main.glu":2:9
%1 = string_literal $String, "Hello, World!", loc "main.glu":2:27
store [init] %1 : $String, %0 : $*String, loc "main.glu":2:9
%2 = load [take] %0 : $*String, loc "main.glu":3:16
call @print, %2 : $String, loc "main.glu":3:15
%3 = integer_literal $Int32, 0
return %3 : $Int32, loc "main.glu":1:13
}
You can then generate the LLVM IR representation of the program using the -print-llvm-ir flag:
1
2
gluc -print-llvm-ir main.glu -o main.ll
less main.ll
This will output the LLVM IR representation of the program in a textual format. It can be quite verbose, but you can inspect the LLVM IR to understand how the Glu compiler generates LLVM IR from the GIL representation of the program.
Note that more intermediate compilation stages are available, such as printing the AST before semantic analysis (-print-astgen), printing the GIL before optimizations (-print-gilgen), and printing GIL between optimization passes. More flags are available; see gluc --help for a full list.
Decompilation
You can then decompile the LLVM IR back to an AST with function declarations only, using the -print-ast flag on the LLVM IR file:
1
gluc -print-ast main.ll
Because the input is LLVM IR, the compiler will go through the IRDec stage to generate an interface for the program.
To view the interface as code, use the -print-interface flag:
1
gluc -print-interface main.ll
This will go through the IRDec, and ASTPrinter stages to generate a human-readable Glu interface file from the LLVM IR representation of the program. You can inspect it to understand how the Glu compiler generates Glu interfaces from an LLVM IR representation of a program.
Decompiling LLVM IR from Other Compilers
Remember that you can also decompile LLVM IR generated by other compilers back to Glu interfaces using the Glu compiler. This can be useful for integrating Glu code with code generated by other compilers.
For example, a similar C program compiled with Clang:
1
2
3
4
5
6
7
#include <stdio.h>
int main() {
char const *message = "Hello, World!";
puts(message);
return 0;
}
Can be compiled with Clang to LLVM bitcode:
1
clang -g -c -emit-llvm main.c -o main.bc
And then decompiled back to Glu source code:
1
gluc -print-interface main.bc
This should generate a Glu interface similar to the original Glu program, with just a single main function in this example.


