Skip to content

Lian Program Analysis Framework

Lian is a next-generation, high-precision program analysis framework designed for multi-language environments. It aims to deliver unified and powerful program analysis capabilities across diverse programming languages, including pointer analysis, dataflow analysis, and taint analysis.

1 Background

Program analysis is a fundamental technique for understanding program behavior, validating software correctness, and supporting system security. Over the past decades, research on traditional industrial languages such as C/C++ and Java has resulted in relatively mature program analysis methodologies and toolchains. In the C/C++ domain, tools such as SVF and Phasar support precise pointer analysis through detailed modeling of alias relations and heap objects. In the Java domain, frameworks such as Soot and WALA exploit relatively stable type systems and reference semantics to construct reusable points-to analysis infrastructures. These techniques have been widely applied in tasks such as code auditing and vulnerability analysis.

In recent years, however, the programming language ecosystem has shifted. Python has become a primary language in artificial intelligence and data analysis. JavaScript and TypeScript are widely used in frontend and full-stack development, with TypeScript ranking first in the GitHub language popularity index in 2025. Go plays an important role in cloud services and systems programming. Compared with traditional languages such as C/C++ and Java, existing program analysis techniques for these languages often suffer from limitations in precision and scalability.

This gap in analysis capability introduces practical risks to software security and reliability. Complex runtime behaviors are difficult to analyze with strong guarantees, and potential vulnerabilities are often missed during development. As these languages are increasingly deployed in critical domains such as financial systems, cloud infrastructure, and AI platforms, insufficient program analysis support becomes a limiting factor for systematic security analysis of modern software systems.

2 Challenges

In these emerging language scenarios, the main difficulties arise from uncertainty in memory models and runtime behavior:

  • Type information is often unavailable or unstable at compile time
  • Object properties can be dynamically added, removed, or modified
  • Dynamic object shapes make field sets difficult to determine statically
  • Computed property names and dynamic code loading complicate static resolution of property access and control flow
  • Higher-order functions, closures, and dynamic dispatch significantly enlarge potential call target sets
  • Dynamic property resolution mechanisms (e.g., prototype chain lookup in JavaScript, or class-hierarchy-based and implicit attribute resolution in Python) cause field resolution to depend on runtime state, obscuring object boundaries and field semantics

These characteristics undermine key assumptions used by traditional program analysis, including stable object layouts, statically known field sets, and type-based constraints for pruning and convergence. As a result, context-sensitive and flow-sensitive pointer analysis becomes substantially more complex, and maintaining effectiveness and scalability for large programs is difficult.

Existing type analysis techniques for dynamic languages, such as TAJS, SAFE, and JSAI, typically rely on abstract interpretation and have achieved progress for specific languages like JavaScript. However, they face structural limitations. Precision must be traded off against heap abstraction and object merging strategies, while branch handling often leads to state splitting and state explosion. Even with widening and similar mechanisms, large-scale analyses still face fundamental precision–efficiency trade-offs.

Another practical issue is that many existing program analysis frameworks are tightly coupled to individual languages. Supporting a new language often requires reimplementing major components from frontend to core analysis, including AST-to-IR translation, control-flow analysis, data-flow analysis, and pointer analysis. This results in high engineering cost and poor reuse across languages.

3 Design Rationale

Addressing these challenges requires rethinking program analysis methodology for modern language ecosystems:

  • A framework is needed that provides extensible and reusable analysis capabilities with low marginal cost when supporting new languages
  • Even under highly dynamic behavior, the framework should support controlled heap abstraction and points-to modeling, reducing reliance on explicit type information

The core objective is to construct a unified, high-precision program analysis framework that can accommodate language diversity.

Commonality across languages provides the basis for unification.

Programming languages share fundamental semantic structures:

  • Most practical languages are Turing-complete
  • Since the 1950s, languages have evolved along imperative, functional, and object-oriented lines, with new languages inheriting and adapting prior designs
  • Empirically, languages in the TIOBE Top 50, as well as newer languages such as V and Odin, exhibit substantial syntactic and semantic similarity, including variable manipulation, control flow, function invocation, and object-oriented constructs

Based on this commonality, unification can be achieved at two levels:

  • Syntax level: language-specific ASTs are translated into a common intermediate representation
  • Semantic level: all analyses are performed on the unified IR, including scope analysis, type-related analysis, module imports, control flow, data flow, pointer analysis, and taint analysis

Once a language frontend can translate ASTs into the unified IR, analysis capabilities can be provided with limited additional language-specific semantic extensions.

At the same time, language differences remain essential and must be explicitly handled.

Key sources of variation include:

  • Type systems: many dynamic languages lack static types, while some static languages permit highly permissive types
  • Variable declaration rules: some languages omit explicit declarations
  • Inheritance models: prototype-based inheritance versus class-based inheritance
  • Function calling conventions: explicit self parameters versus implicit this semantics
  • Property access semantics: indexed access may represent arrays, dictionaries, or object properties
  • for..in constructs: semantics differ significantly across languages

To accommodate these differences, Lian adopts a plugin-based extension mechanism, allowing language-specific behaviors to be integrated without modifying the core analysis logic. Unified analysis provides the structural foundation, while extensibility ensures compatibility.

Unified analysis does not imply weaker analysis.

Dynamic features such as missing types, higher-order functions, and dynamic property resolution limit the effectiveness of type-driven analyses. In these settings, precise, type-independent pointer analysis becomes central:

  • Object types can be inferred from referenced memory contents
  • For calls such as method() or receiver.method(), pointer analysis can determine the actual callable targets

To remain effective, pointer analysis must address several issues:

  • Field sensitivity, requiring explicit modeling of field values
  • Memory object abstraction, extending beyond traditional heap objects
  • Flow sensitivity, particularly for field-level operations beyond SSA approximations
  • Termination guarantees, as points-to sets grow large without type constraints

4 Architecture

The Lian framework consists of four main components:

  • General Intermediate Representation (GIR): a unified IR designed around language commonality; ASTs are translated into GIR, with approximately 1,600 lines of frontend code per language

  • Unified Pointer Analysis Engine: memory objects are abstracted by address, value, and shape; on-the-fly analysis combined with def–use information supports flow sensitivity

  • Language-Specific Extensions: a plugin mechanism supports language-specific semantics

  • State-Flow-Graph-Based Taint Analysis: taint analysis is built on pointer and data-flow results using state flow graphs

5 Applications

Lian supports:

  • Static detection of software defects
  • Security modeling and vulnerability analysis
  • Integration with AI-based workflows for model training and inference

6 Other Important Notes

6-1 Language Support

Current implementation status of Lian language frontends:

Language Status
Python ✅ Fully supported
JavaScript ✅ Fully supported
TypeScript ✅ Fully supported
Java ✅ Fully supported
Go ✅ Fully supported
C ✅ Fully supported
PHP ✅ Fully supported
ArkTS ✅ Fully supported
LLVM IR ✅ Fully supported
Rust MIR Not mature
C# Not mature
Ruby Not mature
Smali Not mature

6-2 Core Module Description

src/lian/
├── lang/                       # Language frontends
│   ├── xxx_parser.py               # Parser for a specific language
│   ├── common_parser.py            # Common base class for all language parsers
│   └── lang_analysis.py            # Main language analysis entry
├── basics/                     # Basic structural analyses
│   ├── control_flow.py             # Control-flow analysis
│   ├── entry_points.py             # Entry-point identification
│   ├── import_hierarchy.py         # Module import hierarchy analysis
│   ├── scope_hierarchy.py          # Scope hierarchy analysis
│   ├── stmt_def_use_analysis.py    # Definition–use analysis
│   └── type_hierarchy.py           # Type hierarchy analysis
├── core/                       # Core semantic analysis engine
│   ├── global_semantics.py         # Top-down semantic analysis
│   ├── prelim_semantics.py         # Bottom-up semantic analysis
│   ├── resolver.py                 # Resolver
│   └── stmt_states.py              # Statement state analysis
├── taint/                      # Taint analysis
│   ├── taint_analysis.py           # Taint analysis engine
│   ├── taint_structs.py            # Taint analysis data structures
│   └── rule_manager.py             # Rule manager
├── events/                     # Plugin system ensuring extensibility and handling language diversity
│   ├── event_manager.py            # Event manager
│   └── default_event_handlers/     # Default event handlers
├── externs/                    # External system modeling
│   └── extern_system.py            # External system integration
└── util/                       # Utility modules
    ├── loader.py                   # File system management
    ├── data_model.py               # Data model
    └── readable_gir.py             # Human-readable GIR output

6-3 Configuration Options

Through configuration files under the default_settings/ directory, users can customize:

  • entry.yaml: entry point rule configuration
  • source.yaml: taint source rule configuration
  • sink.yaml: taint sink rule configuration
  • propagation.yaml: taint propagation rule configuration

7 Summary

Lian is independently developed by the System Software and Reliability Group at Fudan University. It is based on a general intermediate representation and provides unified, high-precision pointer analysis. The framework emphasizes extensibility and language independence, and is intended to support program analysis and security analysis across diverse programming languages with well-defined semantics.