通用中间表示GIR指令集¶

以下为通用中间语言GIR（Generic Intermediate Representation）指令说明：

GIR	属性	描述
program	name body
namespace_decl	name body	namespace name {body}
comment_stmt	data
package_stmt	name	表示package声明语句，格式为package name
import_stmt	attrs name alias	表示import语句
from_import_stmt	attrs source name alias	attrs说明： - unit : 表示name必须是文件名字，而不是文件夹名字 - init：import时，必须对目标文件进行初始化
export_stmt	attrs name alias	表示导出命令，格式为export as attrs用于标识是否是export default(仅用于js)
from_export_stmt	attrs module_path name alias
require_stmt	target name	表示require语句，格式为target = require(name) 只在php中出现
class_decl	attrs name supers static_init init fields methods nested	表示类声明其中attrs表示属性内容，例如public\static\private等等 name表示类的名字 supers表示父类名字，是个列表 type_parameters是所有的类型参数，是个列表 fields是所有的成员变量声明，每一个成员变量用variable_decl表示，详见variable_decl的说明 methods是所有的成员函数，是个列表；每一个成员函数用method_decl表示 nested是个列表，表示了嵌套的其他声明 init和static_init是用于存放初始化内容，init用于存放一般field的初始化，static_init用于存放静态field初始化内容例如public class name extends a implements b { int i = 1; } 可以写成 {"class_decl": {"attrs": ["public"], "name": name, "supers": ["a", "b"], "fields": [{"variable_decl": {"data_type": int, "name":i}}], "init": [ this.i = 1] }}
record_decl	attrs name supers type_parameters static_init init fields methods nested	同上
interface_decl	attrs name supers type_parameters static_init init fields methods nested	同上
enum_decl	attrs name supers static_init init fields methods nested	同上
annotation_type_decl	attrs name static_init init fields methods nested	同上
annotation_type_elements_decl	attrs data_type name value	同上
struct_decl	attrs name fields	同上
parameter_decl	attrs data_type name default_value	用于存放参数声明 attrs代表了属性 data_type代表了该参数的数据类型 name代表了参数名字 default_value代表默认值例如int f(int a, int b = 4); f函数声明中的参数列表为[{"parameter_decl": { "data_type": "int", "name": "a"}},{....}]
variable_decl	attrs data_type name	用于存放本地变量声明或者class内部field声明 attrs为属性 data_type代表了该变量的数据类型 name代表了变量的名字例如signed int i = 10;可以写成两条指令，一条为变量声明，一条为赋值语句 [{"variable_decl": {"attrs": "signed", "data_type": "int", "name": "i"}},{"assign_stmt": {"target": "i", "operand": 10}}]
method_decl	attrs data_type name parameters body	该指令用于表示函数声明attrs为该函数的属性，例如public\static等 data_type为返回值的数据类型 name为函数名称 parameters是参数列表，为列表 init是参数指明了参数初始化内容，为列表 body为函数内部具体指令，为列表以java为例，public int f(int a ) {} attrs: public data_type: int name: f parameters对应的是个list，里面每一项都是parameter_decl: [] 另外，需要注意的是，如果碰到匿名函数，也需要转为有名函数譬如python里面lamda x: x+1 转为def tmp_method(x): return x + 1
assign_stmt	data_type target operand operator operand2	赋值语句 target = operand [ operand2] 如果operand2缺失的话，是一元操作，例如a=b，或者a = -b
call_stmt	target name positional_args packed_positional_args named_args packed_named_args data_type prototype	用于调用函数，格式为target = name(args) target是函数返回值，一般为临时变量 name为被调用函数的名字 positional_args是位置参数列表 data_type为返回值的数据类型 prototype为被调用函数的原型，llvm和dalvik等中间语言会用到例如e = o.f(a, b, c + d) 首先改写 %v1 = o.f %v2 = c + d %v3 = %v1(a, b, %v2) // positional_args:[a, b, %v2] e = %v3 ====================== 关于参数的说明： positional_args是位置参数列表 packed_positional_args是解包位置参数：如果当参数列表中的参数用到解包符号时，将所有的位置参数保存为一个变量，存放到packed_positional_args中；positional_args和packed_positional_args是互斥关系，只能用一个 named_args是关键字参数列表 packed_named_args是打包关键字参数：当参数列表中的参数用到字典解包符号时，将所有的关键字参数保存为一个变量，存放到packed_named_args中；named_args和packed_named_args也是互斥关系例如 f(a,b,c, d=3) 只需要提供positional_args和named_args即可 call_stmt, name:f, positional_args:[a,b,c], named_args:{d:3} 再例如，f(a, b, c, *l, d, a = b, c = d) 这里面就用到解包操作，所以会提供packed_positional_args和named_args 变成如下中间语言： %v0 = [a, b, c] %v1 = %v0.update(l) %v2 = %v1.append(d) call_stmt, name:f, packed_positional_args:%v2, named_args:{a:b, c:d}
object_call_stmt	target reciever name positional_args packed_positional_args named_args packed_named_args data_type prototype	用于调用函数，格式为target = receiver.name(args) target是函数返回值 receiver被调用函数所在的对象 name为被调用函数的名字 positional_args是位置参数列表 data_type为返回值的数据类型 prototype为被调用函数的原型，llvm和dalvik等中间语言会用到例如e = o.f(a, b, c + d) 首先改写 %v2 = c + d %v3 = o.f(a, b, %v2)
echo_stmt	name	php中的echo指令
exit_stmt	name	php中的exit指令
return_stmt	name	用于返回变量，格式为return name
if_stmt	condition then_body else_body	用于表示if.else其中condition是个变量，then_body和else_body分别为对应的指令列表例如if (a + b > c) {} %v1 = a + b %v2 = %v1 > c if (%v2) ...
dowhile_stmt	condition body	类似于if
while_stmt	condition body else_body	类似于if
for_stmt	init_body condition condition_prebody update_body body	传统for循环，格式为 for (init_body; condition_prebody; condition; update_body) {} init_body为初始化块，为指令列表 condition_prebody存在用于判断condition的前置指令，为指令列表 condition为变量 update_body为每次循环迭代需要执行的指令列表例如 for (int a = 1, b = 3; a + b < 10; a ++, b++) {} 其中间语言逻辑结构为 for_stmt: [ init_body: [ variable_decl int a a = 1 variable_decl int b b = 3 ] condition_prebody: [ %v1 = a + b %v2 = %v1 < 10 ] condition: %v2 update_body: [ a = a + 1 b = b + 1 ] body : [] ]
forin_stmt	attrs data_type name receiver body	类似于传统for attrs表示迭代变量属性 data_type为迭代变量数据类型 name为迭代变量 receiver为目标变量 body为指令列表格式为 for attrs data_type name in receiver {} 例如 for x in list 可以写成 forin receiver:list name:x 例如for a, b in list: body 可以写成 forin receiver:list name:%v0: array_read a = %v0[0] array_read b = %v0[1] body
for_value_stmt	attrs data_type name receiver body	类似于for in，为了 js 中的 for of 和 php 中的 for each 设计
switch_stmt	condition body	表示switch指令，格式为switch(condition) 其中condition为用于判断的变量
case_stmt	condition body	表示case指令，格式为switch(condition) 其中condition为用于判断的变量
default_stmt	body	表示default指令，格式为default
break_stmt	name	表示break指令，格式为break name
continue_stmt	name	表示continue指令，格式为continue name
goto_stmt	name	表示goto指令，格式为goto name
yield_stmt	name	表示yield指令，格式为yield name
throw_stmt	name	表示抛异常指令，格式为throw target
try_stmt	body catch_body else_body final_body	表示try指令，格式为try body catch_body else_body final_body 其中 body代表try内部的指令列表 catch_body为catch关键字下面的指令列表，其内部可包含多个catch_stmt， else_body为else关键字下面的指令列表 final为final关键字下面的指令列表
catch_stmt	exception body	表示catch指令，body为其内部指令列表
label_stmt	name	表示label指令
asm_stmt	target data_type attrs data extra args	target = attrs data(asm content), extra(input/out), args
assert_stmt	condition	表示assert指令，其格式为assert condition
del_stmt	receiver name	表示删除指令，例如python中 del target
unset_stmt	receiver name	表示重置指令，php
pass_stmt		表示空指令
global_stmt	name	表示全局应用指令，例如python中glabol target
nonlocal_stmt	name	表示变量引用指令，例如python中nonlocal target
type_cast_stmt	target data_type source error cast_action	表示类型转换，格式为target = (data_type) source 如果有错误，产生error
type_alias_decl	data_type name type_parameters	Typedef: `typedef int a` → `name: a`, `data_type: int`
with_stmt	attrs with_init	表示with指令，后面接上下文管理器，其中attrs一般为async，with_init为对于上下文管理器的初始化操作，body为内部指令列表；对于一条with语句：async with aiofiles.open(filepath, 'r') as file: content = await file.read() 它的中间表示为： {'with_stmt': {'attrs': ['async'], 'with_init': [{'field_read': {'target': '%v0', 'receiver_object': 'aiofiles', 'field': 'open'}}, {'call_stmt': {'target': '%v1', 'name': '%v0', 'args': ['filepath', "'r'"]}}, {'assign_stmt': {'target': 'file', 'operand': '%v1'}}], 'body': [{'field_read': {'target': '%v0', 'receiver_object': 'file', 'field': 'read'}}, {'call_stmt': {'target': '%v1', 'name': '%v0', 'args': []}}, {'await': {'target': '%v1'}}, {'variable_decl': {'data_type': None, 'name': 'content'}}, {'assign_stmt': {'target': 'content', 'operand': None}}]}}
unsafe_block	body	表示rust中的unsafe块其中body是一个指令列表
block	body	表示普通块其中body是一个指令列表
block_start	stmt_id parent_stmt_id	内部指令，表示body的开始，不需要显式定义
block_end	stmt_id parent_stmt_id	内部指令，表示body的结束，不需要显式定义
new_array	target attrs data_type	表示一个新数组的实例化格式为 target = attrs data_type[]
new_object	target attrs data_type args	实例化一个类：target = attrs new datatype(args)
new_record	target attrs data_type	表示一个新字典的实例化
new_set	target attrs data_type	表示一个新集合的实例化
new_struct	target attrs data_type
phi_stmt	target phi_values phi_labels	target = [phi_value, phi_label] 【注】来自llvm，根据路径选择value
mem_read	target address	用于从内存address中读取内容，其格式为target = *address
mem_write	address source	用于往内存address中写内容，其格式为*address = source
array_write	array index source	用于写数组内容，格式为 array[index] = source
array_read	target array index	读取数组指定的元素例如a0 = result[0]
array_insert	array source index	用于向数组指定位置插入元素
array_append	array source	用于写数组内容，格式为 .append()
array_extend	array source	用于写数组内容，格式为 .extend()
record_write	receiver_object key value	用于写map中内容record[key] = value
record_extend	record source	用于写map中内容.update()
field_write	receiver_object field source	用于写receiver_object中成员变量，格式为receiver_object.field = source
field_read	target receiver_object field	用于读receiver_object中成员变量，格式为target = receiver_object.field
slice_wirte	array source start end step	对应于python中的slice指令格式为 array[start: end: step] = source start: 切片开始的索引。 stop: 切片结束的索引。 step: 每次跳过的元素数。
slice_read	target array start end step	对应于python中的slice指令格式为 target = array[start: end: step] start: 切片开始的索引。 stop: 切片结束的索引。 step: 每次跳过的元素数。以python为例：a = list[x:y:3] 对应的中间表示为： {'slice_read': {'target': '%v1', 'array': 'list', 'start': 'x', 'end': 'y', 'step': '3'}} {'assign_stmt': {'target': 'a', 'operand': '%v1'}}
addr_of	target source	用于取地址，其格式为target = &source
await_stmt	target
field_addr	target data_type name	用于查询field在struct_decl中的地址编译量例如： struct address { char name[50]; char street[50]; int phone; }; offsetof(struct address, name); 转换为target = data_type: address, name: name 【注】offsetof()
switch_type_stmt	condition body	表示switch指令，格式为switch(condition) 其中condition为用于判断的变量