GIR Instruction Set¶
The following is the instruction specification for the Generic Intermediate Representation (GIR):
| GIR | Attributes | Description |
|---|---|---|
| program | name body |
|
| namespace_decl | name body |
namespace name {body} |
| comment_stmt | data | |
| package_stmt | name | Represents a package declaration statement, formatted as package name |
| import_stmt | attrs name alias |
Represents an import statement, formatted as import module_path or import module_path as alias Note: module_path is a directory or file path |
| from_import_stmt | attrs source name alias |
attrs notes: - unit : name must be a filename, not a folder- init:target file must be initialized during import |
| export_stmt | attrs name alias |
Represents an export command: export <name> as <alias>attrs identifies if it is export default (JS-only) |
| from_export_stmt | attrs module_path name alias |
|
| require_stmt | target name |
Represents a require statement: target = require(name)(PHP-only) |
| class_decl | attrs name supers static_init init fields methods nested |
Represents a class declaration: - attrs: properties like public, static, private- name: class name- supers: list of parent classes- fields: member variables (each as variable_decl)- methods: member functions (each as method_decl)- nested: list of other nested declaration- init/static_init: initialization blocks, init for initialization of normal fields and static_init for initialization of the static fieldsExample: public class Name extends A implements B { int i = 1; }is represented as: {"class_decl": {"attrs": ["public"], "name": "Name", "supers": ["A", "B"], "fields": [{"variable_decl": {"data_type": "int", "name": "i"}}], "init": [this.i = 1]}} |
| record_decl | attrs name supers type_parameters static_init init fields methods nested |
type_parameter is a list of all typed parameters, other is same as class_decl |
| interface_decl | attrs name supers type_parameters static_init init fields methods nested |
Same as record_decl |
| enum_decl | attrs name supers static_init init fields methods nested |
Same as class_decl |
| annotation_type_decl | attrs name static_init init fields methods nested |
Same as class_decl |
| annotation_type_elements_decl | attrs data_type name value |
Same as class_decl |
| struct_decl | attrs name fields |
Same as class_decl |
| parameter_decl | attrs data_type name default_value |
Represents parameter declarations. - data_type: the data type of parameter- name: name of the parameter- default_value: the default value Example for int f(int a, int b = 4): parameters are [{"parameter_decl": {"data_type": "int", "name": "a"}}, {...}] |
| variable_decl | attrs data_type name |
Represents local/field declarations. Example: signed int i = 10 is split into:[{"variable_decl": {"attrs": "signed", "data_type": "int", "name": "i"}}, {"assign_stmt": {"target": "i", "operand": 10}}] |
| method_decl | attrs data_type name parameters body |
Represents function declarations. - attrs: properties like public, static, private- data_type: the data type of return value- name: name of the return value- parameters: list of parameters, each of the list is parameter_decl- body: list of the statements inside the methodExample: public int f(int a) {} has attrs: "public", data_type: "int", name: "f".Anonymous functions (e.g., Python lambda x: x+1) are converted to named temporary methods def tmp_method(x): return x+1. |
| assign_stmt | data_type target operand operator operand2 |
Assignment statement:target = operand [<operator> operand2]Unary operation if operand2 is missing (e.g., a = -b) |
| call_stmt | target name positional_args packed_positional_args named_args packed_named_args data_type prototype |
Function call logic, formatted as target = name(args)- target: return value of the method, always a temporary variable- name: name of the called method- positional_args: list of positional parameters- packed_positional_args: Unwrapped positional parameters, and positional_args are mutually exclusive.- named_args: list of keyword parameters- packed_named_args: unwrapped keyword parameters, and named_args are mutually exclusive.- data_type: data type of the return value- prototype: prototype of the called function, will be used in llvm and dalvikExample for e = f(a, b, c + d):1. %v2 = c + d2. %v3 = f(a, b, %v2) // positional_args:[a, b, %v2]4. e = %v3 Example for f(a,b,c, d=3): call_stmt, name:f, positional_args:[a,b,c], named_args:{d:3} |
| object_call_stmt | target receiver name positional_args packed_positional_args named_args packed_named_args data_type prototype |
Function call logic, formatted as target = receiver.name(args)- target: return value of the method, always a temporary variable- receiver: the object containing the called method- name: name of the called method- positional_args: list of positional parameters- packed_positional_args: Unwrapped positional parameters, and positional_args are mutually exclusive.- named_args: list of keyword parameters- packed_named_args: unwrapped keyword parameters, and named_args are mutually exclusive.- data_type: data type of the return value- prototype: prototype of the called function, will be used in llvm and dalvikExample for e = o.f(a, b, c + d):1. %v2 = c + d2. %v3 = o.f(a, b, %v2) |
| echo_stmt | name | PHP echo statement. |
| exit_stmt | name | PHP exit statement. |
| return_stmt | name | Returns a variable: return name |
| if_stmt | condition then_body else_body |
Example:if (a + b > c) {} →%v1 = a + b%v2 = %v1 > cif (%v2) {...} |
| dowhile_stmt | condition body |
Similar to if_stmt |
| while_stmt | condition body else_body |
Similar to if_stmt |
| for_stmt | init_body condition condition_prebody update_body body |
Traditional for loop, formatted as for (init_body; condition_prebody; condition; update_body) {} - init_body: list of statements, the initial block- condition_prebody: list of statements, used for pre-statements of judging condition- condition: a variable- update_body: list of statements, need to be execute every time in the cycleExample for for (int a = 1, b = 3; a + b < 10; a ++, b++) {}for_stmt: [ init_body: [ variable_decl int a a = 1 variable_decl int b b = 3 ] condition_prebody: [ %v1 = a + b %v2 = %v1 < 10 ] condition: %v2 update_body: [ a = a + 1 b = b + 1 ] body : [] ] |
| forin_stmt | attrs data_type name receiver body |
Similar to for_stmt- attrs: attributions of Iterative variables- data_type: data type of Iterative variables- name: the Iterative variables- receiver: the target variable- body: list of statementsFormatted as for attrs data_type name in receiver {}Iteration statement (e.g., for x in list).forin receiver:list name:x |
| for_value_stmt | attrs data_type name receiver body |
Designed for JS for of and PHP foreach. |
| switch_stmt | condition body |
switch(condition) {body} |
| case_stmt | condition body |
case block inside switch. |
| default_stmt | body | default block inside switch. |
| break_stmt | name | break name |
| continue_stmt | name | continue name |
| goto_stmt | name | goto name |
| yield_stmt | name | yield name |
| throw_stmt | name | throw target |
| try_stmt | body catch_body else_body final_body |
try {body} catch {catch_body} else {else_body} finally {final_body} |
| catch_stmt | exception body |
catch block |
| label_stmt | name | Label declaration |
| asm_stmt | target data_type attrs data extra args |
Inline assembly: target = attrs data(asm content) |
| assert_stmt | condition | assert condition |
| del_stmt | receiver name |
Python del target |
| unset_stmt | receiver name |
PHP unset |
| pass_stmt | Empty statement (Python pass) |
|
| global_stmt | name | Python global target |
| nonlocal_stmt | name | Python nonlocal target |
| type_cast_stmt | target data_type source error cast_action |
Type casting: target = (data_type) sourceif there is an error, there will be an error |
| type_alias_decl | data_type name type_parameters |
Typedef: typedef int a → name: a, data_type: int |
| with_stmt | attrs with_init |
Context manager (e.g., Python async with ... as file).- attrs: always be async- with_init: the initialization of the context manager- body: statements inside the with_stmtExample as async with aiofiles.open(filepath, 'r') as file: content = await file.read()the GIR is : {'with_stmt': {'attrs': ['async'], 'with_init': [{'field_read': {'target': '%v0', 'receiver_object': 'aiofiles', 'field': 'open'}}, {'call_stmt': {'target': '%v1', 'name': '%v0', 'args': ['filepath', "'r'"]}}, {'assign_stmt': {'target': 'file', 'operand': '%v1'}}], 'body': [{'field_read': {'target': '%v0', 'receiver_object': 'file', 'field': 'read'}}, {'call_stmt': {'target': '%v1', 'name': '%v0', 'args': []}}, {'await': {'target': '%v1'}}, {'variable_decl': {'data_type': None, 'name': 'content'}}, {'assign_stmt': {'target': 'content', 'operand': None}}]}} |
| unsafe_block | body | Rust unsafe block |
| block | body | Generic code block |
| block_start | stmt_id parent_stmt_id |
Internal marker for block start |
| block_end | stmt_id parent_stmt_id |
Internal marker for block end |
| new_array | target attrs data_type |
Array instantiation: target = attrs data_type[] |
| new_object | target attrs data_type args |
Class instantiation: target = attrs new data_type(args) |
| new_record | target attrs data_type |
Dictionary instantiation |
| new_set | target attrs data_type |
Set instantiation |
| new_struct | target attrs data_type |
Struct instantiation |
| phi_stmt | target phi_values phi_labels |
LLVM-style phi node: target = [phi_value, phi_label] |
| mem_read | target address |
Read from memory: target = *address |
| mem_write | address source |
Write to memory: *address = source |
| array_write | array index source |
Array write: array[index] = source |
| array_read | target array index |
Array read: a0 = result[0] |
| array_insert | array source index |
Insert into array at index |
| array_append | array source |
Append to array: <array>.append(<source>) |
| array_extend | array source |
Extend array: <array>.extend(<source>) |
| record_write | receiver_object key value |
Map write: record[key] = value |
| record_extend | record source |
Map extend: <record>.update(<source>) |
| field_write | receiver_object field source |
Field write: receiver_object.field = source |
| field_read | target receiver_object field |
Field read: target = receiver_object.field |
| slice_wirte | array source start end step |
Python slice write: array[start:end:step] = source- start: The index at which the slice begins- stop: The index at which the slice stops- step: The number of skipped elements each time |
| slice_read | target array start end step |
Python slice read: target = array[start:end:step]Example as a = list[x:y:3]{'slice_read': {'target': '%v1', 'array': 'list', 'start': 'x', 'end': 'y', 'step': '3'}} {'assign_stmt': {'target': 'a', 'operand': '%v1'}} |
| addr_of | target source |
Address-of: target = &source |
| await_stmt | target | await statement |
| field_addr | target data_type name |
Field offset calculation (e.g., offsetof(struct address, name))Example as struct address { char name[50]; char street[50]; int phone; }; offsetof(struct address, name); Convert to target = data_type: address, name: name |
| switch_type_stmt | condition body |
Type-based switch statement |