Skip to content

Taint Tracking

Author: Yang Guangliang
Last Updated: 2026-01-07
Copyright Notice: This document is licensed under CC BY-NC-SA 4.0

Taint analysis is an extension of data-flow analysis. It is used to track the flow of specified data in a program and is widely used to detect data privacy leakage, security vulnerabilities, and more. The core idea is to mark the data sources of interest (Source) with a label (taint), observe and track its direction and propagation, and check whether tainted data eventually reaches sensitive or dangerous statements, also called sinks (Sink). If tainted data reaches a sink, a security issue may exist. The data flow between Source and Sink is also called a taint flow (Taint Flow).

Therefore, taint analysis has three key elements: Source, Sink, and Propagation. This chapter introduces them one by one.

Sources (Source)

A Source is where data of interest or untrusted data comes from, i.e., where “dirty data” originates. Common sources include:

  • User data: such as username, password, phone number, and geolocation.
  • File data: such as locally stored password files, and database files.
  • Attacker-controlled data: such as the parameters of entry points, and network data.

Sinks (Sink)

A Sink is the location of a dangerous operation, i.e., where “dirty data” must not flow to. Common sinks include:

  • Database queries: such as SELECT, UPDATE, DELETE, etc.
  • File operations: such as reading files, writing files, etc.
  • Command execution: such as executing OS commands, calling system functions, etc.
  • Network requests: such as sending HTTP requests, sending emails, etc.

When tainted data flows to a sink, there may be security risk.

Propagation

Propagation describes how taint spreads in a program, i.e., the rules for taint propagation. Common propagation rules include:

  • Assignment propagation: the taint of the rvalue contaminates the lvalue
  • String operations: such as concatenation, substring, replacement, etc.
  • Container operations: such as operations on containers like List, Map, Set, etc.
  • Function calls: calling system functions or third-party library functions may cause taint propagation

Corresponding to propagation rules are Taint Removal Rules, i.e., sanitizing taint so it no longer contaminates other variables.

Taint Analysis Algorithm

Taint analysis implements taint tracking on top of data-flow analysis. The algorithm steps are as follows:

Algorithm: Taint Analysis
Input: program code, data flow, taint source Source, taint sink Sink, propagation rules PropagationRules, removal rules RemovalRules
Output: discovered taint flows from Source to Sink

1. Initialization: 
   TaintSet = ∅  // store all tainted variables or heap objects

2. Traverse each statement: 
    for each statement stmt: 
        case stmt of: 
            // Case 1: Source
            x = source(): 
                // mark x as tainted
                TaintSet.add(x) 
                // [optional] the points-to set of x can also be tainted
                TaintSet.add(pts(x))  

            // Case 2: Propagation
            stmt ∈ PropagationRules: 
                // apply taint propagation rule
                apply_propagation_rule(stmt) 

            // Case 3: Removal
            stmt ∈ RemovalRules: 
                // x is no longer tainted
                TaintSet.remove(x) 
                // [optional] the points-to set of x can also be untainted
                TaintSet.remove(pts(x)) 

            // Case 4: Sink
            sink(x): 
                if x in TaintSet: 
                    // ⚠️ vulnerability found! 
                    warning(x, stmt)  
                    // collect taint flow
                    collect_taint_flow(x, stmt) 

3. Return all discovered taint flows

Taint Analysis Applications

[01] // vulnerable code
[02] public void login(UrlRequest request, Connection conn) {
[03]     // Step 1: username is a source
[04]     String username = request.getParameter("username");  // [SOURCE] 
[05]     
[06]     // Step 2: query is tainted (propagation) 
[07]     String query = "SELECT * FROM users WHERE name='" + username + "'";  // [TAINTED] 
[08]     
[09]     // Step 3: taint flows to a dangerous location (sink) 
[10]     conn.executeQuery(query);  // [SINK] 
[11] } 

The code above contains an SQL injection vulnerability. The vulnerability is at line 7. The taint analysis details are:

Vulnerability type:       SQL injection
Source:                   [04] request.getParameter("username") 
Sink:                     [10]stmt.executeQuery(query) 
Taint path:               username -> query -> executeQuery

Summary

Taint analysis is widely used in software security detection and is an important tool for program analysis in the security domain.