P1: Basic Structural Analysis¶
Basic structural analysis is the starting point of Lian's semantic analysis pipeline. Its goal is not to derive detailed runtime semantics, but to traverse the program globally and systematically inventory the program structure, laying the groundwork for later in-depth analysis.
This stage is implemented in src/lian/basics and is driven uniformly by BasicAnalysis.run(). The analysis target is GIR statements. All analysis results are organized and stored using the globally unique statement ID (stmt_id) as the index. This stage mainly answers three basic questions: "What is in the program?", "Where are they located?", and "How are they organized with each other?"
1 Entry Point Identification¶
Entry point identification determines where program analysis starts and is handled by the EntryPointGenerator module.
Based on the rules defined in the default_settings/entry.yaml configuration file (such as method names, modifier attributes, or file locations), this module scans all function or method declaration statements and marks those that satisfy the conditions as entry points. The results are stored as a set of instruction IDs and are used to guide subsequent call graph construction and reachability analysis.
Note that entry point analysis itself does not introduce control-flow or data-flow reasoning; it only provides a reliable set of starting points for later analysis stages.
2 Scope Hierarchy Analysis¶
Scope hierarchy analysis builds the scope skeleton of the program and clarifies the static ownership relationship of each symbol. This analysis is mainly completed by the ScopeHierarchy module.
During analysis, the system traverses GIR declaration statements within each Unit (file). When it encounters class, function, or block boundaries, it creates the corresponding Scope object, and determines the final scope ownership by recursively locating the parent scope (such as determineScope). For symbols that are prone to ambiguity in syntactic structure, such as class members and function parameters, the analysis stage corrects their scope ownership to ensure consistent behavior in subsequent symbol resolution.
The results are stored centrally in ScopeSpace, which maintains all scope nodes and their parent-child relationships. At the same time, a summary UnitSymbolDeclSummary is generated for each Unit, recording the mapping from identifier names to their declaration scopes within the file. These two data structures directly support subsequent variable resolution and cross-scope reference analysis.
3 Module and Import Relationship Analysis¶
Module and import relationship analysis restores the symbol dependency structure across files and modules. The corresponding implementation module is ImportHierarchy.
The analysis stage first identifies the set of exported symbols visible to the outside for each Unit, then parses import statements in GIR and resolves bindings from imported items to exported items. In implementation, this process not only builds dependencies between Units, but also refines dependencies down to the symbol level. For wildcard imports (such as import *) or indirect imports, the system recursively expands the export items of the target module to ensure that every referenced symbol can be traced back to its original definition.
The resulting module and import hierarchy is organized as an import graph ImportGraph, used to precisely describe symbol-level reference relationships. This dependency structure forms a multi-layer dependency view from directory structure to files and then to public symbols.
4 Type Hierarchy Analysis¶
Type hierarchy analysis mainly targets object-oriented languages and type aliases in statically typed languages. It is used to restore the class inheritance structure (Class Hierarchy), the static visibility of class members, and type aliases. This analysis is performed by the TypeHierarchy module.
During analysis, the system traverses all class declaration statements, parses inheritance information, and builds the class inheritance and type-alias graph (TypeGraph). On this basis, the system further computes the set of methods available to each class at the static semantic level, merging methods defined by the class itself with methods inherited from parent classes, and generates the MethodsInClass table for later method call resolution.
This stage does not handle dynamic dispatch or runtime polymorphism. It only provides a stable class-member view for later call resolution and pointer analysis.
5 Control Flow Analysis¶
Control flow analysis builds a control-flow graph (CFG) for each function. The corresponding implementation module is ControlFlowAnalysis. Control flow analysis is function-based and builds a separate CFG for each function.
The analysis proceeds over the function body. Statement IDs are used as graph nodes, and control-flow edges describe possible execution-order relationships. During construction, the system adds corresponding control-flow edges based on structured statement types in GIR (such as conditionals, loops, and exception handling). For statements that interrupt sequential execution such as return, break, and throw, their control-flow relationships are completed via delayed connections when the structure is closed.
The results are stored as MethodCfg, providing an execution-path basis for subsequent intraprocedural data-flow analysis.
6 Definition-Use (Def-Use) Relationship Analysis¶
Based on the existing scope structure and control-flow graphs, def-use analysis identifies variable definition points and use points in the program. The corresponding implementation module is StmtDefUseAnalysis. Like control flow analysis, def-use analysis is performed per function.
This stage traverses GIR instructions in the function body along control flow, identifies variable definitions and uses based on instruction types, and records the results in the per-statement status StmtStatus. A unified storage space (SymbolStateSpace) is created for variables and constants, to hold variable and constant objects generated during analysis.
This stage also carries the key task of variable ID assignment: assigning or mapping each variable in the program to a unique symbol ID. If the referenced variable is within the same file, its symbol ID is used directly. If the referenced variable is across files or modules, the system queries the import graph (see Section 3: Module and Import Relationship Analysis) to resolve the owning module, and retrieves the corresponding global symbol ID from the target module's symbol table.
After obtaining def-use relationships for all instructions, a summary MethodDefUseSummary can be generated for the function, including the local variable set, externally referenced symbols, parameter list, and return value expression.
7 Summary¶
Overall, the basic structural analysis stage completes a systematic inventory and organization of the program's static structure. Using the unified GIR representation, without relying on type inference and without performing fixpoint computation, the system establishes the program's entry set, the hierarchical relationships of scopes and symbols, cross-module symbol dependency structure, inheritance relationships between classes, the intraprocedural control-flow skeleton, and variable definition and use locations. All results from this stage are saved in stable data structures and serve as the factual basis for subsequent semantic analysis.