Skip to content

Language Difference Compatibility

GIR and the unified program analysis engine are designed around semantic commonalities across multiple programming languages. However, different languages still have significant differences in syntax structure, runtime models, and standard library behavior. To handle these differences in a unified way, Lian introduces an event-driven plugin mechanism to locally extend language-specific semantics.

The basic principle of this mechanism is to constrain language-specific logic to explicit event trigger points, avoiding direct impact of language differences on the core analysis pipeline.

1 Events and Event Management

An event is the basic unit that triggers language-customized handling logic. It is identified jointly by language kind and event kind. Event-related data is wrapped in the EventData structure, which includes the language identifier, event number, and the input and output information required by event handling.

The event manager (event_manager) is responsible for event registration and triggering:

  • Event registration

Plugins declare the language kinds and event kinds they support during registration, and provide corresponding event handler functions.

  • Event triggering

When the analysis pipeline triggers an event, the event manager calls matching handlers in registration order. If multiple handlers correspond to the same event, they are executed in sequence until one handler explicitly terminates further handling.

Events are triggered by instrumenting the target analysis modules, and are dispatched uniformly through the notify() interface provided by the event manager, decoupling event dispatch logic from the core analysis pipeline.

2 Event Handler Function Design

Event kinds follow a naming convention of pre/post hooks, mainly including:

  • XXX_BEFORE: triggered before the target operation is executed.
  • XXX_READY or XXX_AFTER: triggered after the target operation is completed.

The current event mechanism mainly covers the following analysis stages:

  • GIR translation frontend: including source code preprocessing, and key processing points before and after GIR flattening.
  • Semantic analysis stage: including the hook points before and after points-to analysis for statements (corresponding to stmt_states).

In the source code preprocessing stage, the event mechanism handles language-specific syntax and structural differences, for example:

  • Refactoring Python from..import.. import statements.
  • Removing PHP comments and expanding namespaces.
  • Normalizing floating-point constants.

At the GIR intermediate representation level, event handlers perform language unification operations:

  • Unify this/self representation, mapping to %this and treating it as a reserved keyword.
  • Complete variable declarations so that assignment statements in dynamic languages are equivalent to "variable declaration + assignment" at the GIR level; if the variable declaration already exists, no extra handling is needed.
  • Automatically insert a unified main function %unit_init() for source files to standardize file initialization functions (also usable as the default entry function).

In the semantic analysis stage, events are mainly used to handle operations such as field access, object initialization, and method calls. For example, in JavaScript, by triggering the P2STATE_FIELD_READ_BEFORE event before a field read, call-chain-related semantics can be customized to fit its dynamic object model.

3 External System Mechanism

The external system is used to describe or approximate external functions or interfaces, providing pluggable semantic approximations for third-party library functions or system APIs during analysis. Multiple forms are supported for handling external functions:

  • Code-based description

Use simplified simulated code to express the behavior of external functions, providing approximate semantics during analysis. This is suitable for external interfaces with relatively simple structure and clear semantics.

  • Rule-based description

Use configuration files to define abstract propagation relationships between external function parameters and return values. The configuration includes language kind, method identifier, and propagation path description. Rules use special markers to represent sources and targets: %this for the current object, %arg[0-9]+ for argument positions, and %return for the return value. After parsing the configuration, the system generates internal rule objects and applies them during analysis.

  • Model-based description

For complex calling patterns that are difficult to express via rules or simulated code, customized modeling using internal functions of the analysis framework is allowed to cover calling conventions of specific languages or libraries.

In Lian, the external system is integrated via function-call-related events and the plugin mechanism. For example, during call analysis, the P2STATE_EXTERN_CALLEE event is triggered to pass call information to the external system for matching and handling. If applicable rules exist, the external system takes over the analysis logic at that call site.

4 Summary

The event-based plugin mechanism provides the Lian framework with an explicit way to handle language differences, enabling multi-language programs to be modeled and analyzed under a unified analysis pipeline. By constraining language-specific behavior within clear event boundaries and plugin interfaces, this mechanism keeps the core analysis pipeline stable while preserving room to extend semantic features of different languages.