Post

The GIL Intermediate Language

The Glu Intermediate Language (GIL) is a low-level, single-static assignment (SSA) representation of Glu code. It is used as an intermediate representation during the compilation process and is designed to be easy to generate, optimize, and translate to and from LLVM IR.

Overview

Here is an example of a simple Glu program and its corresponding GIL representation:

1
2
3
4
5
6
func main() {
    let x: Int = 10;
    let y: Int = 20;
    let z: Int = x + y;
    std::print("The sum of x and y is " + z);
}
1
2
3
4
5
6
7
8
9
10
11
12
gil @main() : $() -> Void {
    %1 = integer_literal $Int, 10
    debug %1 : $Int, let "x"
    %2 = integer_literal $Int, 20
    debug %2 : $Int, let "y"
    %3 = call @+ : $(Int, Int) -> Int, %1 : $Int, %2 : $Int
    debug %3 : $Int, let "z"
    %4 = string_literal $String, "The sum of x and y is "
    %5 = call @+ : $(String, Int) -> String, %4 : $String, %3 : $Int
    call @std::print : $(String) -> Void, %5 : $String
    return
}

The GIL representation is a series of instructions that operate on typed values. Each instruction has a unique identifier (%1, %2, etc.) and a type annotation ($Int, $String, etc.). The debug instruction is used to associate a variable name with a value for the debugger.

Instruction result identifiers start with a % character, function identifiers start with a @ character, and types start with a $ character.

The top level syntax of GIL is the same as Glu: the same syntax is used for imports and type definitions, although they can have more annotations in GIL. The main difference is in the function definitions, which are written in SSA form.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// This part is the same in Glu and GIL:
import std;

typealias FD = Int;

struct File {
    fd: FD,
}

// In Glu:
func open_file(name: String) -> File {
    ...
}
// In GIL:
gil @open_file : $(String) -> File {
entry(%0: String):
    debug %0 : $String, arg "name", loc "file.glu":11:16
    ...
}

Basic Blocks

GIL functions are divided into basic blocks, which are sequences of instructions that execute in order. Each basic block ends with a terminator instruction that determines the control flow to the next block.

Here is an example of a function with multiple basic blocks:

1
2
3
4
5
6
7
8
9
10
gil @select : $(Bool, Int, Int) -> Int {
entry(%0: Bool, %1: Int, %2: Int):
    cond_br %0 : Bool, then, else
then:
    br merge(%1 : Int)
else:
    br merge(%2 : Int)
merge(%3: Int):
    return %3 : $Int
}

In this example, the entry block is the starting point of the function. It contains a conditional branch (cond_br) that jumps to either the then or else block based on the value of the boolean argument. Both the then and else blocks end with an unconditional branch (br) to the merge block, which contains the return instruction. Conditional branches are used to implement control flow constructs like if and while statements. Both branches cannot be the same. The unconditional branch is used to jump to a specific block, and supplies the arguments for the block.

In GIL, basic blocks are named with a label followed by a colon (:). The label is used to refer to the block in branch instructions. Basic blocks can have multiple arguments, which are the values passed to the block from the previous block. The first block in a function is the entry point and has as many arguments as the function parameters.

The same function in Glu would look like this:

1
2
3
func select(cond: Bool, a: Int, b: Int) -> Int {
    return cond ? a: b;
}

Instructions

GIL instructions are divided into several categories based on their purpose:

  • Terminator Instructions: These instructions determine the control flow of the program. Examples include return, br, and cond_br. They are always the last instruction in a basic block.
  • Constant Instructions: These instructions create constant values. Examples include integer_literal and string_literal.
  • Debug Instruction: This instruction has no effect on the program’s execution but is used to associate a variable name with a value when using a debugger, or for decompiling the GIL back to Glu.
  • Call Instruction: This instruction calls a function or an operator.
  • Conversion Instructions: These instructions convert values between different types. Examples include cast_int_to_ptr and bitcast.
  • Memory Instructions: These instructions allocate, load, or store memory. Examples include alloca, load, and store.
  • Aggregate Instructions: These instructions work with aggregate types like structs. Examples include struct_extract, struct_create and struct_destructure.

Instruction Syntax

The general syntax of a GIL instruction is as follows:

1
%result = instruction_name arg0, arg1, arg2, ..., loc "file.glu":line:column

Where:

  • %result is the SSA identifier for the result of the instruction.
  • instruction_name is the name of the instruction.
  • arg0, arg1, arg2, … are the arguments to the instruction.
  • loc "file.glu":42:3 is the location in the source code the instruction was generated from (file, line, and column).

Some instruction don’t have a result (such as return or debug):

1
instruction_name arg0, arg1, arg2, ..., loc "file.glu":42:3

Some can have multiple results (such as struct_destructure):

1
%result1, %result2 = instruction_name arg0, arg1, arg2, ..., loc "file.glu":42:3

Arguments can be SSA values (starting with %), constant values (integers, floats, strings), global function identifiers (starting with @), types (starting with $), or basic block identifiers.

Terminator Instructions

Terminator instructions are used to control the flow of the program. They are always the last instruction in a basic block. They don’t produce a value, but they may take arguments.

return

The return instruction is used to return a value from a function. It takes a single argument, which is the value to be returned, of the return type of the function.

1
return %3 : $Int

br

The br instruction is an unconditional branch that jumps to a specific basic block. It takes the target block and the arguments for the block.

1
2
3
br merge(%1 : Int)
br loop(%0 : Int, %1 : Int)
br end

cond_br

The cond_br instruction is a conditional branch that jumps to one of two basic blocks based on a boolean condition. It takes the condition, the target blocks for the true and false cases.

1
cond_br %0 : Bool, then, else

unreachable

The unreachable instruction is used to indicate that a particular path of execution should never be reached. It is typically used to signal an error condition that should never occur.

1
unreachable

Constant Instructions

Constant instructions create constant values that can be used in the program. They return a value and have no side effects.

integer_literal

The integer_literal instruction creates a constant integer value of a specified integer type.

1
2
3
%1 = integer_literal $Int, 10
%2 = integer_literal $UInt, 20
%3 = integer_literal $Int8, -7

The first argument is the type of the integer value result (Int, UInt64, Int8, …), and the second argument is the integer value.

float_literal

The float_literal instruction creates a constant floating-point value of a specified floating-point type.

1
2
%1 = float_literal $Float, 3.14
%2 = float_literal $Double, 2.71828

The first argument is the type of the floating-point result (Float, Double, Float80, …), and the second argument is the floating-point value.

string_literal

The string_literal instruction creates a constant string value.

1
%1 = string_literal $String, "Hello, world!"

The first argument is the type of the string result (String, *Char, …), and the second argument is the content.

function_ptr

The function_ptr instruction creates a constant function pointer value from a global function or operator.

1
2
%1 = function_ptr @main : $() -> Void
return %1 : $*() -> Void

The argument is the global function, and the result type is a pointer to the function type.

enum_variant

The enum_variant instruction creates a constant enum variant value.

1
2
enum Result { SUCCESS, FAILURE }
%1 = enum_variant @Result::SUCCESS

The argument is the enum variant locator. The result type is the enum type.

Debug Instruction

The debug instruction is used to associate a variable name with a value for debugging purposes. It has no effect on the program’s execution.

The simplest form of the debug instruction is:

1
debug %1 : $Int, let "x", loc "file.glu":42:3

This associates the value %1 with a constant binding (let) named “x”. The loc part specifies the location in the source code where the variable was defined (file, line, and column). The loc part exists for all instructions, and is also important for stepping through the code in a debugger.

If the variable is mutable, you can use the var binding:

1
debug %1 : $Int, var "x", loc "file.glu":42:3

If the variable is a parameter, you can use the arg binding:

1
debug %1 : $Int, arg "x", loc "file.glu":42:3

If the value referenced by the debug instruction is not used by another non-debug instruction, the value and anything it references must not have side effects. Removing the debug instruction and any dead code it references should not change the behavior of the program.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Unoptimized code:
%1 = integer_literal $Int, 10
debug %1 : $Int, let "x"
%2 = integer_literal $Int, 20
debug %2 : $Int, let "y"
%3 = call @+ : $(Int, Int) -> Int, %1 : $Int, %2 : $Int
debug %3 : $Int, let "z"
return %3 : $Int

// Valid optimized code #1:
%1 = integer_literal $Int, 10
debug %1 : $Int, let "x"
%2 = integer_literal $Int, 20
debug %2 : $Int, let "y"
%3 = integer_literal $Int, 30
debug %3 : $Int, let "z"
return %3 : $Int

// Valid optimized code #2:
%1 = integer_literal $Int, 10
debug %1 : $Int, let "x"
%2 = integer_literal $Int, 20
debug %2 : $Int, let "y"
%3 = call @+ : $(Int, Int) -> Int, %1 : $Int, %2 : $Int
debug %3 : $Int, let "z"
%4 = integer_literal $Int, 30
return %4 : $Int

The call to the + function is not optimized out in the second example because the value is used in the debug instruction. The + function is known to have no side effects, so it can either be folded into a constant 30, or the call can be kept for debugging only. LLVM will fold the call to the + function into a constant anyway at a later stage.

Call Instruction

The call instruction is used to call a function or an operator. It takes the function or operator name, and the arguments to the function or operator.

1
2
3
call @std::print : $(String) -> Void, %0 : $String
%3 = call @+ : $(Int, Int) -> Int, %1 : $Int, %2 : $Int
call %4 : $*(String) -> Void, %0 : $String

The first argument can be:

  • A global function name, starting with @, and having a function type.
  • An operator name, starting with @, and having a function type.
  • A local function pointer, starting with %, and having a function pointer type.

The following arguments must match the function type’s argument types. The result type of the instruction is the return type of the function. For void functions, no result is returned.

Conversion Instructions

Conversion instructions are used to convert values between different types. All conversion instructions have two arguments: the destination type, and the value to convert. The result type is the destination type. They have no side effects.

cast_int_to_ptr

The cast_int_to_ptr instruction converts an integer value to a pointer value. This may break pointer aliasing rules, so it should be used with caution.

1
2
%0 = integer_literal $Int, 0xff73000
%1 = cast_int_to_ptr $*Char, %0 : $Int

cast_ptr_to_int

The cast_ptr_to_int instruction converts a pointer value to an integer value of the same bit width.

1
%1 = cast_ptr_to_int $Int64, %0 : $*Char

bitcast

The bitcast instruction converts a value to a different type without changing the underlying bit pattern. This is useful for converting between different pointer types, between integers and floating-point values, between integers of different signedness, or between incompatible struct types.

1
2
3
%1 = bitcast $Int32, %0 : $Float
%1 = bitcast $*Int32, %0 : $*Float
%1 = bitcast $UInt, %0 : $Int

int_trunc

The int_trunc instruction truncates an integer value to a smaller integer type, discarding the high-order bits.

1
%1 = int_trunc $Int8, %0 : $Int

int_zext

The int_zext instruction zero-extends an integer value to a larger integer type, filling the high-order bits with zeros. Should only be used for unsigned integers.

1
%1 = int_zext $UInt32, %0 : $UInt8

int_sext

The int_sext instruction sign-extends an integer value to a larger integer type, filling the high-order bits with the sign bit. Should only be used for signed integers.

1
%1 = int_sext $Int32, %0 : $Int8

float_trunc

The float_trunc instruction truncates a floating-point value to a smaller floating-point type.

1
%1 = float_trunc $Float, %0 : $Double

float_ext

The float_ext instruction extends a floating-point value to a larger floating-point type.

1
%1 = float_ext $Double, %0 : $Float

Memory Instructions

Memory instructions are used to allocate, load, and store memory. They have side effects.

alloca

The alloca instruction allocates memory on the stack for a value of a specified type. It returns a pointer to the allocated memory.

1
%1 = alloca $Int

The result type is a pointer to the specified type.

load

The load instruction dereferences a pointer and loads the value from memory. It takes a pointer to the memory location to load from.

1
%1 = load %0 : $*Int

The argument must be a pointer type, and the result type is the deferenced type.

store

The store instruction stores a value to a memory location. It takes the value to store and a pointer to the memory location.

1
store %1 : $Int, %0 : $*Int

The first argument is the value to store, and the second argument is a pointer type to the memory location.

Aggregate Instructions

Aggregate instructions work with aggregate types like structs. They have no side effects and return a value.

struct_extract

The struct_extract instruction extracts a field from a struct value. It takes the struct value and the field.

1
2
struct Point { x: Int, y: Int }
%1 = struct_extract %0 : $Point, @Point::x

The first argument is the struct value, and the second argument is the field locator. The result type is the type of the field.

struct_create

The struct_create instruction creates a struct value from its fields. It takes the field values.

1
2
struct Point { x: Int, y: Int }
%1 = struct_create $Point, %0 : $Int, %1 : $Int

The first argument is the struct type, and the following arguments are the field values. The following arguments must match the field types in order. The result type is the first argument.

struct_destructure

The struct_destructure instruction destructures a struct value into its fields. It takes the struct value and the field types.

1
%1, %2 = struct_destructure %0 : $Point

The argument is the struct value. The result types are the field types.

Pointer Instructions

Pointer instructions are used to manipulate pointers. They have no side effects and return a value.

struct_field_ptr

The struct_field_ptr instruction computes the address of a field within a struct. It takes a pointer to the struct and the field value.

1
%1 = struct_field_ptr %0 : $*Point, @Point::x

The first argument is a pointer to the struct, and the second argument is the field locator. The result type is a pointer to the field type.

ptr_offset

The ptr_offset instruction computes the address of an element at a specified offset from a pointer. It takes a pointer and an unsigned integer offset or constant.

1
2
%1 = ptr_offset %0 : $*Int, %2 : $UInt
%1 = ptr_offset %0 : $*Int, 4

The first argument is a pointer, and the second argument is an integer offset. The result type is the same as the first argument.

Conclusion

The Glu Intermediate Language (GIL) is a low-level, single-static assignment (SSA) representation of Glu code. It is designed to be easy to generate, optimize, and translate to and from LLVM IR. GIL functions are divided into basic blocks, which are sequences of instructions that execute in order. GIL is in the middle of the compilation process, between the high-level Glu code and the low-level LLVM IR. It is an essential part of the Glu compiler infrastructure and is used to perform optimizations and generate efficient machine code.

This post is licensed under CC BY 4.0 by the author.