Skip to content

State Flow Graph (SFG)

Taint analysis is a commonly used technique in vulnerability detection, and its execution depends on the results of program semantic analysis. In the semantic analysis stage, the system has already computed symbols, abstractions of memory objects, and the pointing and dependency relationships among them. Taint tracking needs to directly use these semantic results during propagation and path queries, so it is necessary to organize them into a unified structured representation. Based on this, the State Flow Graph (SFG) is introduced to express semantic relationships related to data propagation.

The State Flow Graph does not introduce new semantic inference rules. Its role is to uniformly model existing semantic analysis results. These results include instruction-level def-use relations, reachability information obtained by data-flow analysis, and symbol states produced by pointer analysis. As an intermediate representation, the State Flow Graph provides direct graph-structure support for taint propagation and path queries.

1 Composition of the State Flow Graph

A State Flow Graph consists of nodes and edges, used to describe relationships among symbols, memory states, and statements.

The State Flow Graph contains the following types of nodes:

  • Symbol nodes (Symbol): represent symbolic entities in the program, such as variables, function parameters, and return values.
  • State nodes (State): represent the memory object abstractions corresponding to symbols during semantic analysis.
  • Statement nodes (Stmt): represent concrete program statements, used to mark where data is produced and used.

Edges in the State Flow Graph characterize semantic relationships between nodes, mainly including:

  • Reference edges: indicate that a statement uses a symbol or a state.
  • Definition edges: indicate that a statement defines or updates a symbol or a state.
  • Symbol-state association edges: indicate the association between a symbol and the set of states corresponding to it during semantic analysis.
  • Symbol-flow edges: indicate data dependency relationships formed between symbols due to assignment, parameter passing, or return values.
  • State-copy edges: indicate the association between old and new states produced by the state-copy strategy during semantic analysis.

2 Example

Consider the following example program:

def main():  
    a = A()  
    b = a.g  
    foo(a)  
    sink(b.f)  

def foo(z):  
    x = z.g  
    w = source()  
    x.f = w  

main()

In this program, a and z form an alias relationship during the function call. Access to field g causes multiple symbols to point to the same state, and the assignment to x.f in foo affects the read of b.f in main.

In the semantic analysis stage, Lian has computed the points-to relationships among symbols and their corresponding memory object abstractions. On this basis, the State Flow Graph organizes these results into a graph structure to express:

  • The symbol dependency relationship between a and z.
  • The sharing relationship among the states corresponding to a.g, z.g, b, and x.
  • How the data produced by source() propagates to sink(b.f) via state updates.

The State Flow Graph constructed from the program above is as follows:

In this State Flow Graph, starting from the state node corresponding to source(), following symbol-flow edges, state-copy edges, and statement-related edges, a data propagation path that reaches the sink call site can be obtained. This graph structure allows direct enumeration of possible data propagation paths by graph traversal from source nodes, supporting taint propagation analysis and path backtracking.