Skip to content

Taint Tracking Technique

Lian's taint tracking is based on the State Flow Graph (SFG). SFG organizes symbols, states, and their dependency relationships computed during semantic analysis into a unified graph structure, providing a directly traversable basis for data propagation computation.

During taint analysis, the system uses bit vectors to mark and distinguish sensitive data. Each taint source corresponds to an independent tag bit. During propagation, bitwise operations are used to maintain taint sets, supporting precise modeling of simultaneous propagation from multiple sources.

1 Source and Sink Configuration

Taint-related configuration is managed uniformly by RuleManager. These configuration files are located under the default_settings directory. The system supports defining taint sources (source), taint sinks (sink), and propagation rules using YAML files.

  • source.yaml: describes where sensitive data is produced, such as user input, file reads, or environment variable access.
  • sink.yaml: describes where potentially dangerous operations occur, such as database queries, command execution, or external API calls.
  • propagation.yaml: describes propagation rules during statements, function calls, and field access.

The rule manager parses configuration files and generates internal Rule objects. Rules can be distinguished by language, supporting differentiated modeling in multi-language analysis scenarios.

2 Taint Propagation Framework

Taint propagation logic is driven by the TaintAnalysis class. Lian uses the State Flow Graph as the analysis target and runs a worklist-based computation process on the graph structure. The propagation algorithm is scheduled by PathFinder. The basic process repeatedly pops nodes to process from the SFG, updates taint states of adjacent nodes based on node type and propagation rules, and continues until a fixpoint is reached.

Propagation distinguishes three types of nodes:

  • Symbol nodes (Symbol)
  • State nodes (State)
  • Statement nodes (Stmt)

3 Propagation Rules on Different Node Types

Symbol nodes describe variable-level propagation. When a symbol is marked as tainted, its taint information is propagated to the state nodes associated with the symbol, and further affects statement nodes that use the symbol.

State nodes describe propagation on abstract memory objects. On one hand, taint can propagate backward to all symbols pointing to the state. On the other hand, taint can propagate downward along fields or substates, used to model internal data-flow relationships inside objects.

Statement nodes handle propagation behavior introduced by concrete statements. Based on propagation rules, the system transfers taint from inputs used by the statement to outputs defined by the statement. For statements with side effects such as method calls, it additionally handles propagation paths that affect object states.

4 Taint State Management

The system maintains taint information consistency through multi-layer state management components.

  • TaintEnv: maintains the current taint sets for symbols and states.
  • TagBitVectorManager: provides bit-vector allocation, merge, and comparison operations.
  • TaintStateManager: supports access-path-based taint tracking, used to precisely model field reads/writes and nested object structures.

Interprocedural propagation is completed by synchronization between actual arguments and formal parameters, allowing taint to be transferred across functions along call edges.

5 Analysis Results and Path Recording

Taint analysis results include not only whether a propagation relationship from source to sink exists, but also complete data-flow path information. The Flow class records the propagation path from a taint source to a sink, providing a basis for vulnerability localization and result explanation.

During analysis, the system can mark visited nodes for debugging the propagation process and validating analysis behavior.

6 Summary

Lian's taint tracking uses the State Flow Graph as the foundation, converting semantic analysis results into a directly queryable data propagation structure. Through bit-vector tagging, multi-node propagation rules, and interprocedural synchronization, the system can systematically model and analyze sensitive data flows in complex programs.