Paper status: completed

ClosureX: Compiler Support for Correct Persistent Fuzzing

Published:02/06/2025
Original Link
Price: 0.100000
1 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

ClosureX introduces a novel fuzz testing mechanism that addresses semantic inconsistencies in persistent fuzzing. It achieves near-persistent performance with fine-grained state restoration, increasing testcase execution rates by over 3.5 times while enhancing bug discovery capab

Abstract

Abstract missing from provided text.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The title of the paper is ClosureX: Compiler Support for Correct Persistent Fuzzing. The central topic is introducing a novel compiler-based approach to enable correct and high-performance persistent fuzzing, which aims to resolve the long-standing trade-off between fuzzing speed and correctness.

1.2. Authors

  • Rishi Ranjan: Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

  • Ian Paterson: Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

  • Matthew W Hicks: Virginia Polytechnic Institute and State University, Blacksburg, VA, United States

    All authors are affiliated with Virginia Polytechnic Institute and State University (Virginia Tech), indicating a research focus from this institution on compiler design, system security, and software testing.

1.3. Journal/Conference

The paper was published at ASPLOS '25: 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS is a highly reputable and influential conference in the fields of computer architecture, programming languages, and operating systems. Publication at ASPLOS signifies that the work is considered a significant contribution to the intersection of these areas, particularly in compiler support for system-level tools.

1.4. Publication Year

The paper was published on 30 March 2025.

1.5. Abstract

The abstract introduces fuzzing as a widely adopted method for bug hunting and software hardening. It highlights that increasing fuzzing throughput directly correlates with bug discovery rates. The highest-performance fuzzing strategy is persistent fuzzing, which reuses a single process for multiple test cases, eliminating expensive process creation and tear-down costs. However, persistent fuzzing suffers from semantic inconsistency due to residual state changes from prior test cases, leading to missed crashes, false crashes, and overall incorrectness.

The paper presents ClosureX, a novel fuzzing execution mechanism that achieves fine-grain state restoration. ClosureX resets only test-case-execution-specific state, offering near-persistent performance with the correctness typically associated with heavyweight state restoration (like fresh process execution). ClosureX is implemented as a set of LLVM passes integrated with AFL++AFL++. Evaluation on ten popular open-source fuzzing targets shows that ClosureX maintains semantic correctness, increases test case execution rate by over 3.5X3.5\mathrm{X} on average, and finds bugs 1.9X1.9\mathrm{X} faster than AFL++AFL++, discovering 15 0-day bugs (4 CVEs).

The original source link provided is /files/papers/6961013cd770c00fe105dd74/paper.pdf. Given the publication details, this paper is officially published.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the inherent trade-off between performance and correctness in current fuzzing methodologies, specifically within the context of persistent fuzzing.

Why is this problem important? Fuzzing is a critical technique for discovering software bugs and hardening applications. Research consistently shows that a higher test case execution rate directly leads to a higher bug discovery rate. Therefore, optimizing the speed at which fuzzers can execute test cases is paramount for their effectiveness. Previous work has largely focused on minimizing the overhead of program instrumentation (e.g., coverage tracking). However, with these advancements, the overhead associated with process management (creating, initializing, and tearing down processes for each test case) has become the dominant bottleneck.

Existing fuzzing execution mechanisms fall along a spectrum:

  • Fresh Process Fuzzing: Each test case runs in a completely new process. This ensures perfect isolation and semantic correctness (each test case starts from a clean, identical state) but incurs significant performance overhead due to repeated process creation, loading, and destruction.
  • Persistent Fuzzing: A single process is reused for all test cases, looping back to a starting point after each execution. This drastically reduces process management overhead and achieves the highest test case execution rate. However, it suffers from semantic inconsistency. State changes from one test case persist into subsequent ones, leading to:
    • Missed crashes: Bugs might not manifest because the program is in an unintended, "stale" state.

    • False crashes: Seemingly valid crashes might occur due to accumulated incorrect state, wasting developer time.

    • Non-reproducibility: Bugs become difficult to reproduce due to dependence on test case order.

      The paper's entry point is the observation that this performance-correctness dilemma stems from treating process management as a coarse-grain operation. The innovative idea is to enable fine-grain state restoration at the software level, allowing a single process to maintain isolation between test cases while avoiding the heavy costs of OS-level process creation.

2.2. Main Contributions / Findings

The paper's primary contributions are:

  • A Novel Fuzzing Execution Mechanism (ClosureX): ClosureX introduces a new point on the state restoration continuum, focusing on restoring only test-case-execution-specific state. This allows for the performance benefits of persistent fuzzing without sacrificing semantic correctness.

  • Compile-time Instrumentation with LLVM: ClosureX is implemented as a set of LLVM passes that transform unmodified target programs into "naturally restartable" programs. This compile-time approach bakes in state tracking and rollback capabilities directly into the application.

  • Fine-grain State Restoration Design: The paper details how ClosureX handles critical sources of program state: the program stack (implicitly or via setjmp/longjmp), heap memory (by tracking and freeing allocated memory), global memory (by identifying and restoring modifiable sections), and file descriptors (by tracking and closing open handles).

  • Integration with AFL++AFL++: ClosureX seamlessly integrates with the leading fuzzer AFL++AFL++, demonstrating its practical applicability without requiring modifications to the fuzzer itself.

  • Empirical Validation of Performance and Correctness:

    • Performance: ClosureX increases the test case execution rate by over 3.5X3.5\mathrm{X} on average compared to AFL++AFL++'s forkserver mode, which is considered the fastest correct process management mechanism in fuzzing.
    • Bug Finding Effectiveness: It discovers bugs 1.9X1.9\mathrm{X} faster and more consistently (25% more trials finding bugs) than AFL++AFL++.
    • Bug Discoveries: The evaluation found 15 0-day bugs across 4 programs, leading to 4 CVEs and 7 patches, demonstrating significant real-world impact.
    • Correctness: ClosureX is shown to maintain semantic correctness by ensuring dataflow equivalence (identical program state) and control-flow equivalence (identical execution path) to a fresh process execution for all test cases.
  • Open-Source Release: ClosureX is open-sourced, facilitating further research and adoption by the software testing community.

    These findings solve the problem of achieving both high performance and semantic correctness in fuzzing, enabling more efficient and reliable bug discovery, which is crucial for software hardening and security.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand ClosureX, a reader should be familiar with the following core concepts:

  • Fuzzing: A software testing technique that involves feeding invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for crashes, assertions, or other indicators of bugs. The goal is to discover vulnerabilities and improve software robustness.

    • Coverage-Guided Fuzzing: A type of fuzzing where the fuzzer uses feedback from code coverage (which parts of the program's code are executed) to prioritize and generate new test cases. Test cases that explore new code paths are considered "interesting" and are used to seed further mutations, aiming to maximize code exploration.
    • Seed Test Cases: Initial valid or "interesting" inputs provided to the fuzzer to start the fuzzing process.
    • Mutation: The process of making small, often random, changes to existing test cases to generate new ones.
    • Throughput/Test Case Execution Rate: The number of test cases a fuzzer can execute per unit of time. Higher throughput generally leads to faster bug discovery.
  • Process Management: The way an operating system (OS) handles the creation, execution, and termination of processes. This is central to fuzzing performance.

    • Process: An instance of a computer program that is being executed. Each process has its own isolated memory space, file descriptors, and other resources.
    • fork() and exec() (Linux/Unix):
      • fork(): A system call that creates a new process (child process) by duplicating the calling process (parent process). The child process is an exact copy of the parent, including its memory space, but with a separate process ID.
      • exec(): A system call that replaces the current process's image with a new program. The new program starts executing from its main function. fork() is often followed by exec() to run a different program in the child process.
    • system(): A C standard library function that executes an external command by creating a new process, running the command, and waiting for it to complete. It's a high-level wrapper around fork() and exec().
    • copy-on-write (CoW): An optimization technique used by operating systems when duplicating a process (e.g., via fork()). Instead of immediately copying all memory pages, the parent and child processes initially share the same memory pages. These pages are only copied (duplicated) when one of the processes attempts to modify them. This reduces memory usage and speeds up fork().
  • Program State: The collection of data and information that defines the current condition of a running program. Key components include:

    • Stack: A region of memory used for local variables, function call arguments, and return addresses during function calls. It operates as a Last-In, First-Out (LIFO) structure.
    • Heap: A region of memory used for dynamic memory allocation (e.g., using malloc, calloc, new). Memory on the heap must be explicitly allocated and freed by the programmer.
    • Global Memory/Variables: Data stored in a static memory region that is accessible from anywhere in the program. These variables are typically initialized when the program starts.
    • File Descriptors: Integer identifiers used by the operating system to represent open files, sockets, or other I/O resources.
  • LLVM (Low Level Virtual Machine): A collection of modular and reusable compiler and toolchain technologies.

    • LLVM IR (Intermediate Representation): A low-level, assembly-like, platform-agnostic representation of code used by LLVM. It's strongly-typed and uses Static Single-Assignment (SSA) form, where every variable is assigned to only once.
    • LLVM passes: Modular components (written in C++) that operate on LLVM IR to perform analyses, optimizations, or transformations (instrumentation). opt is the LLVM tool used to run these passes.
  • setjmp() and longjmp(): C standard library functions for non-local jumps (similar to exceptions in higher-level languages).

    • setjmp(jmp_buf env): Saves the current execution environment (stack pointer, program counter, register values) into a jmp_buf structure. It returns 0 when called directly.
    • longjmp(jmp_buf env, int val): Restores the execution environment previously saved by setjmp(), effectively jumping back to the point where setjmp() was called. setjmp() then appears to return val (which must be non-zero). This mechanism can be used to unwind the stack and transfer control flow across multiple function calls without normal returns.

3.2. Previous Works

The paper discusses various existing fuzzing techniques and their approaches to process management:

  • Traditional Fresh Process Execution (system(), fork/exec):

    • Concept: For each test case, a completely new process is created, the target program is loaded, the test case is executed, and then the process is destroyed.
    • Pros: Guarantees perfect semantic correctness and isolation between test cases.
    • Cons: Very high overhead due to repeated process creation, initialization, and tear-down. This is the slowest approach. The paper notes Windows fuzzers often use per-test-case process creation. Linux equivalents (fork/exec) are slightly better due to copy-on-write but still incur significant overhead.
  • Forkserver-based Fuzzing (e.g., AFL++AFL++):

    • Concept: To reduce the overhead of repeatedly loading the target binary, a forkserver is used. The fuzzer forks once, loads the target binary into memory, and pauses it at the beginning of its main function. For subsequent test cases, the forkserver repeatedly forks() itself, and the child process continues execution of the target with the new test case. After the test case, the child process exits.
    • Pros: Significantly faster than fresh process creation because the target binary loading and initial setup occur only once. Benefits from copy-on-write to reduce memory duplication costs.
    • Cons: Still incurs overhead from fork() system calls (duplicating process state, page-level management), process tear-down, and kernel-level costs. It's still relatively coarse-grain as it operates at the page level.
  • Kernel-based Snapshotting (e.g., AFL++ Snapshot LKM [7], Wen Xu et al. [34]):

    • Concept: This approach leverages operating system kernel features to take and restore snapshots of a process's state. It often involves specialized system calls or kernel modules that intelligently protect and restore essential program state (e.g., memory pages, register values) between test cases.
    • Pros: Can offer modest performance improvements over forkservers by optimizing kernel-level state management.
    • Cons: Highly OS-specific (often Linux-only, as shown in Table 2). Requires significant engineering effort to maintain compatibility with evolving kernel versions. Not fine-grain enough in the paper's view, as it still operates at a process level rather than execution-specific state.
  • Persistent Fuzzing (e.g., AFL++AFL++ persistent mode, LibFuzzer [14]):

    • Concept: The fuzzer reuses a single process for all test cases by simply jumping back to the program's starting point (e.g., an instrumented loop around the target function) instead of exiting.
    • Pros: Eliminates all process management overhead, achieving the highest test case execution rate.
    • Cons: Breaks inter-test-case isolation, leading to semantic inconsistency. State changes (modified global variables, memory leaks, open file handles) from one test case persist and affect subsequent ones, causing missed crashes, false crashes, and non-reproducibility. Requires manual effort to reset state, which is often infeasible for complex applications.
  • Binary-only Fuzzing and Snapshotting (WinFuzz [29], Retrowrite [4]):

    • Concept: These techniques deal with fuzzing without access to source code, often by using binary rewriting or dynamic instrumentation to insert coverage tooling or snapshot/restore capabilities. WinFuzz, for example, addresses the lack of fork() on Windows by embedding snapshotting logic directly into the binary.
    • Cons: Binary-only solutions typically incur higher overhead due to runtime code injection and often have less precise state tracking compared to source-available compile-time approaches.
  • Fuzzing Instrumentation Improvements (Sanitizer Coverage Guards [6, 14, 35], SanitizerCoverage [30], AddressSanitizer [24]):

    • Concept: Focus on reducing the overhead of collecting code coverage information or detecting runtime errors. pcguard (from AFL++AFL++ and LibFuzzer) and SanitizerCoverage are LLVM-based instrumentation tools that insert hooks to track code execution paths. AddressSanitizer detects memory errors.
    • Relationship to ClosureX: ClosureX is orthogonal to these advancements; it focuses on process management overhead, while these focus on instrumentation overhead. They can be combined.

3.3. Technological Evolution

The field of fuzzing has seen a continuous drive to improve both effectiveness (finding more bugs) and efficiency (finding bugs faster). Early fuzzers were often simple generational or mutational tools that lacked feedback. The introduction of coverage-guided fuzzing (e.g., AFL, LibFuzzer) marked a significant leap in effectiveness by directing the fuzzer towards unexplored code paths.

Alongside this, there has been an evolution in optimizing the execution environment:

  1. Naive Process Creation: Slowest, but most correct (system(), fork/exec without forkserver).

  2. Forkserver: A major performance improvement by avoiding repeated binary loading, leveraging copy-on-write. Still incurs fork() overhead.

  3. Persistent Mode: Achieves maximum execution speed by eliminating all process management, but at the cost of correctness due to stale state.

  4. Kernel-level Optimizations: Attempts to improve forkserver-like behavior at the OS level, but struggles with portability and maintenance.

  5. Binary-level Snapshotting: Addresses specific OS limitations (like Windows' lack of fork()) for binary-only scenarios.

    ClosureX fits into this timeline by proposing a new solution that takes the performance benefits of persistent fuzzing but solves its fundamental correctness issue. It leverages compiler-level instrumentation (LLVM passes) to achieve fine-grain, software-based state restoration, a more robust and portable approach than kernel-level or binary-level solutions for source-available code.

3.4. Differentiation Analysis

Compared to the main methods in related work, ClosureX offers several core differences and innovations:

  • Correctness in Persistent Fuzzing: The most significant differentiation is ClosureX's ability to provide semantic correctness within a persistent fuzzing setup. Traditional persistent mode explicitly trades correctness for speed. ClosureX eliminates this trade-off by actively restoring the program state between test cases at a fine-grain, making each execution semantically equivalent to a fresh process.
  • Compile-time, Software-level Solution: Unlike kernel-based snapshotting (which requires OS modifications and suffers from portability issues) or binary-only snapshotting (which can be less precise and incur runtime overhead), ClosureX operates at compile-time using LLVM passes. This allows for precise, source-aware instrumentation and optimization by the compiler, leading to better performance and portability across different operating systems (as it primarily relies on LLVM IR and standard C library hooks).
  • Fine-grain State Restoration: ClosureX selectively tracks and restores only the test-case-execution-specific state (modified global variables, dynamic memory, open file handles) rather than entire process snapshots (like forkserver's page-level copy-on-write or kernel snapshots). This targeted approach minimizes overhead while ensuring thorough cleanup.
  • Automated State Management: ClosureX automates the complex and error-prone task of tracking and reverting state changes. Developers using persistent fuzzing today often have to manually insert code to reset state, which is difficult to scale and prone to errors. ClosureX provides this automatically for whole applications.
  • Orthogonal to Other Fuzzing Improvements: ClosureX's focus is on process management overhead. It is designed to be compatible with and complement other fuzzing advancements, such as coverage-tracking instrumentation (e.g., Sanitizer Coverage Guards) and sanitizers (e.g., AddressSanitizer), allowing for cumulative performance and effectiveness gains.

4. Methodology

4.1. Principles

The core idea behind ClosureX is to achieve a "best of both worlds" solution for fuzzing performance and correctness. It posits that it is possible to reuse a single process across test cases (like persistent fuzzing) if the abstraction of each test case getting its own process is preserved. This preservation is achieved through a form of rollback recovery, where the program state is reset to its starting condition after each test case execution.

The theoretical basis and intuition are that much of the overhead in fresh process or forkserver fuzzing comes from unnecessarily discarding and reloading test-case-invariant state. ClosureX aims to identify and preserve this invariant state, while meticulously tracking and restoring only the test-case-variant state. This fine-grain state restoration is integrated directly into the compiled program through LLVM passes, making the program naturally restartable without relying on special operating system features. The method combines static analysis (at compile time) to identify mutable state with dynamic analysis (at runtime) to log and restore changes.

4.2. Core Methodology In-depth (Layer by Layer)

ClosureX tackles two primary challenges:

  1. Injecting a persistent fuzzing loop into the program.

  2. Adding fine-grain program state tracking and rollback between test cases.

    These challenges are addressed through a set of LLVM passes and a harness that wraps the target program.

4.2.1. Injecting a Persistent Fuzzing Loop

To enable persistent fuzzing within a single process, ClosureX modifies the target program to loop its execution.

  • Renaming main Function:

    • The RenameMainPass (an LLVM Module pass) identifies the target program's original main function.
    • It renames this function to target_main using the setName method call on the Function object in LLVM IR.
    • ClosureX then provides its own main function (part of its stub harness) that calls target_main in a loop.
    • For targets that take fuzzed input from the command line, the target_main function includes logic to replace the appropriate argv (argument vector) with the test case supplied by the fuzzer.
  • Handling Abnormal Program Exits (exit() calls):

    • Programs often terminate using exit() system calls upon detecting malformed input or critical errors, especially common in fuzzing. This bypasses normal function returns and stack unwinding.
    • ClosureX handles this using the C language features setjmp() and longjmp(), akin to a try-catch block.
    • At the beginning of ClosureX's harness main function, setjmp(jmp_buf env) is called. This function saves the current CPU state (registers, stack pointer, program counter) into a jmp_buf structure. This establishes a "restore point."
    • The ExitPass (an LLVM pass) identifies all calls to exit() within the target's instrumented source code.
    • It replaces these exit() calls with calls to a custom wrapper function, exitHook, using the replaceAllUsesWith method.
    • The exitHook function then calls longjmp(jmp_buf env, int val). This restores the CPU state from the jmp_buf saved by setjmp(), effectively transferring control back to ClosureX's main loop and unwinding the stack to that point. The val argument (exit status) is passed through.
    • Crucially, ClosureX only hooks exit() calls within the instrumented target code, leaving exit() calls in external libraries (like libc) untouched, as these are considered critical and must not be intercepted.
  • Communicating with the Fuzzer:

    • ClosureX inserts code into the harness to synchronize with the external fuzzer (AFL++AFL++ in this case).
    • This includes mechanisms to:
      • Wait for the fuzzer to generate and provide a new test case.

      • Inform the fuzzer that new coverage data has been collected from the previous test case execution.

        The conceptual structure of ClosureX's harness is depicted in the paper's Listing 1 (reproduced here for completeness and detailed explanation):

// CLosureX's main function
int main(int argc, char** argv) {
  // CLosureX initializes global/heap memory sections and file descriptors
  // Reads fuzzed input from stdin/file
  // Communicates with AFL++ to receive a new test case

  // Loop for persistent fuzzing
  while (__AFL_LOOP(1000)) { // AFL++ macro for looping, 1000 is an example count
    // Save current stack/register state for abnormal exits
    if (setjmp(env_buffer)) {
      // Target called exit() or longjmp; stack is restored
    } else {
      // Normal execution path: Call the renamed target's main function
      target_main(argc, argv);
    }
    // Stack is already restored (either by return or longjmp)
    // Now restore other program states
    restore_global_sections();  // Restore global variables
    reset_heap_memory();        // Free leaked heap memory
    close_open_file_handles();  // Close unclosed file descriptors
  }
  return 0;
}
  • __AFL_LOOP(1000): This is an AFL++AFL++ specific macro. When instrumented, it sets up the persistent loop. The argument (e.g., 1000) specifies how many test cases AFL++AFL++ expects to process in this single process before potentially restarting or checking other fuzzer state.
  • setjmp(env_buffer): This is called once at the start of each fuzzing iteration. If target_main executes normally and returns, setjmp returns 0, and the else block is executed. If longjmp is called from exitHook within target_main (or a function it calls), control flow returns here, and setjmp effectively returns a non-zero value (the val passed to longjmp), executing the if block.
  • target_main(argc, argv): This is the renamed original main function of the target program.

4.2.2. State Restoration For Correct Execution

After each target_main execution, ClosureX performs fine-grain state restoration to ensure semantic correctness for the next test case.

  • Program Stack Restoration:

    • Normal Returns: If target_main completes its execution normally and returns, the C calling convention automatically unwinds the stack, and local variables are deallocated. When control returns to ClosureX's harness, the stack is naturally clean.
    • Abnormal Exits (longjmp): As described above, if exitHook (replacing exit()) calls longjmp(), the stack pointer and registers are restored from the jmp_buf saved by setjmp(). This effectively "rolls back" the stack to its state at the beginning of the fuzzing iteration.
    • Thus, the program stack is always reset to a clean state for the start of each test case.
  • Program Heap Restoration:

    • Problem: Most programs rely on the OS to reclaim all dynamically allocated memory when a process exits. In persistent fuzzing, if memory is allocated on the heap but not freed (memory leaks), it accumulates across test cases, leading to out-of-memory errors and false crashes.

    • Solution (HeapPass):

      1. The HeapPass (an LLVM pass) is applied to the LLVM module.
      2. It declares wrapper functions (myMalloc, myCalloc, myRealloc, myFree) for standard dynamic memory allocation functions (malloc, calloc, realloc, free). These declarations are resolved during the linking phase with ClosureX's harness.
      3. It then replaces all instances of calls to the malloc-family functions (malloc, calloc, realloc) and free within the target program's LLVM IR with calls to these custom wrappers using replaceAllUsesWith.
      4. Tracking Allocations: When a malloc-family wrapper (e.g., myMalloc) is called, it allocates memory and then stores the returned pointer into an internal hash map structure maintained by ClosureX's harness.
      5. Tracking Deallocations: When myFree is called by the target, it frees the memory and removes the corresponding pointer from the hash map.
      6. Cleanup between Test Cases: After target_main returns (or longjmp occurs), ClosureX's harness iterates over all pointers still present in its hash map. These represent memory that the target allocated but failed to free. The harness then explicitly free()s these remaining pointers.
    • This mechanism prevents memory leaks and ensures that the heap state is clean for each new test case. The following figure (Figure 5 from the original paper) shows the ClosureX heap resetting procedure during runtime:

      Figure 5. ClosureX heap resetting procedure during runtime 该图像是示意图,展示了ClosureX在执行前、执行中和执行后对堆块映射的操作流程。图中显示了在执行前的块图、执行中块的更新过程及执行后释放的块,明确了每个阶段堆块的状态变化。

  • A) Before Execution: The chunk map (hash map) is empty, indicating no memory is tracked by ClosureX.

  • B) During Execution: The target program allocates memory chunks using malloc (or calloc, realloc). ClosureX's wrappers intercept these calls and store the pointers (e.g., ptr1, ptr2, ptr3) in its chunk map. If the target calls free(ptr2)free(ptr2), ptr2 is removed from the chunk map.

  • C) After Execution: When the target's execution finishes, ClosureX iterates through the chunk map. Any remaining pointers (e.g., ptr1, ptr3) are considered "leaked" memory by the target. ClosureX then calls free() on these pointers to clean up the heap before the next test case.

  • Global Memory Restoration:

    • Problem: Global variables are initialized once at program start. In persistent fuzzing, modifications to global variables by one test case persist and affect subsequent ones, leading to semantic inconsistency.

    • Solution (GlobalPass):

      1. The GlobalPass (an LLVM compile-time pass) first identifies all potentially modifiable global variables in the target program. It does this by iterating over all global variables in the LLVM module and calling the isConstant method; if it returns false, the variable is deemed modifiable.
      2. All such modifiable global variables are moved into a new, distinct memory section within the target binary, named closure_global_section, using the setSection API call in LLVM. This separates them from truly constant data (like immutable strings).
      3. During the fuzzing campaign, ClosureX's harness obtains the address and size of this closure_global_section (e.g., via environment variables CLOSURE_GLOBAL_SECTION_ADDR and CLOSURE_GLOBAL_SECTION_SIZE, which can be populated using ELF parsing tools like readelf).
      4. Snapshot and Restore: Before target_main executes, the harness copies the entire content of closure_global_section into an internal buffer, creating a "snapshot." After target_main returns (or longjmp occurs), the harness copies the data back from this buffer to closure_global_section using byte-level writes.
    • This ensures that the global memory state is reset to its clean, initial state for every test case. The following figure (Figure 3 from the original paper) illustrates the transformation performed by ClosureX's GlobalPass:

      Figure 3. The transformation performed by ClosuReX's Global pass 该图像是示意图,展示了 ClosureX 的全局处理过程。左侧分别展示了 .bss 和 .data 的变量配置,通过 Closure Global Pass 进行转换,最终形成 closure_global_section,包含了处理后的变量和常量信息。

  • Before Global Pass: The diagram shows .bss and .data sections containing both constant data (STR_CONST, INT_CONST) and modifiable variables (GLOBAL_VAR, GLOBAL_ARR).

  • Global Pass: The GlobalPass identifies GLOBAL_VAR and GLOBAL_ARR as potentially modifiable.

  • After Global Pass: These modifiable variables are moved into a new, dedicated closure_global_section. The original .bss and .data sections are left with only constant data or other variables not deemed modifiable by the pass. This allows ClosureX to snapshot and restore only the relevant (and smaller) closure_global_section. The following figure (Figure 4 from the original paper) shows the ClosureX global resetting procedure:

    Figure 4. ClosureX global resetting procedure 该图像是示意图4,展示了ClosureX全局重置过程。左侧为作为快照的真实全局部分,中间为执行期间修改的全局部分,右侧为执行后用于恢复的快照。

  • Before Execution: The ClosureX harness takes a snapshot of the closure_global_section (which contains all modifiable global variables) and stores it in an internal buffer. This snapshot represents the "ground truth" initial state.

  • During Execution: The target program executes, potentially modifying the values within the closure_global_section.

  • After Execution: When the target finishes, the ClosureX harness uses the stored buffer to restore the closure_global_section to its original state, effectively undoing any modifications made by the test case.

  • File Descriptor Restoration:

    • Problem: The OS limits the number of open file descriptors per process. In persistent fuzzing, if file handles are opened but not closed, they accumulate. This can lead to the process running out of descriptors, causing false crashes.
    • Solution (FilePass):
      1. The FilePass (an LLVM Function Pass) uses function replacement techniques similar to heap management.
      2. It replaces calls to file opening routines like fopen() with fopen_hook(). The fopen_hook() wrapper opens the file and stores the returned file handle into an internal hash map.
      3. It replaces calls to fclose() with fclose_hook(). The fclose_hook() wrapper closes the file and removes the handle from the hash map.
      4. Cleanup between Test Cases: After each test case, ClosureX's harness iterates over the hash map and explicitly close()s any remaining open file handles.
    • Optimization for Initialization Handles: ClosureX distinguishes between file handles opened once during the target's initialization phase and those opened during test-case-specific execution. For initialization handles, instead of closing and reopening them, ClosureX resets their position (e.g., using fseek for file streams) to the beginning, improving efficiency.
    • This pattern can be extended to other resource handles like sockets and shared memory.

4.2.3. Summary of ClosureX LLVM Passes

The ClosureX implementation consists of five LLVM passes as summarized in Table 3 from the paper:

CLOSUREX Pass Functionality
RenameMainPass Rename target's main
HeapPass Inject tracking of target's heap memory
FilePass Inject tracking of target's file descriptors
GlobalPass Move target's writable globals into a separate memory section
ExitPass Rename target's exit calls

Each pass targets a specific aspect of program state management to achieve the overall goal of correct persistent fuzzing. The instrumentation is designed to be compatible with existing coverage-tracking instrumentation (like AFL++AFL++'s Sanitizer Coverage Guard).

5. Experimental Setup

5.1. Datasets

The evaluation of ClosureX utilized a set of ten diverse, popular open-source benchmark programs commonly used in fuzzing platform evaluations. The selection criteria for these benchmarks included:

  • Extensive fuzzing history: They have been subject to multiple peer-reviewed academic works or are well-fuzzed open-source benchmarks, ensuring that any newly discovered bugs are significant.

  • Diversity: The benchmarks vary in complexity and task.

  • Rich variety of input formats: This ensures the generalizability of ClosureX across different input parsing scenarios.

    The following are the evaluation benchmarks used, along with their input formats and executable sizes, from Table 4 of the original paper:

    Benchmark Input Format Executable Size
    bsdtar tar 4.7 M
    libpcap pcap 2.4 M
    gpmf-parser mp4 (GoPro) 720 K
    libbpf bpf object 1.9 M
    freetype ttf 4.6 M
    giftext gif 232 K
    zlib zlib archive 260 K
    libdwarf ELF 2.8 K
    c-blosc2 bframe 12 M
    md4c markdown 652 K

These diverse benchmarks allow ClosureX to be evaluated across different application types (e.g., archivers, networking libraries, multimedia parsers, font engines, compression libraries, markdown parsers), showcasing its broad applicability and effectiveness.

5.2. Evaluation Metrics

The evaluation of ClosureX focuses on two primary criteria: correctness and performance. Several metrics are used to quantify these.

  • Test Case Execution Rate (Throughput):

    • Conceptual Definition: This metric quantifies the number of test cases a fuzzer can process per unit of time (e.g., per second, per hour). It is a direct measure of a fuzzer's efficiency. A higher test case execution rate is known to correlate directly with a higher bug discovery rate.
    • Mathematical Formula: Not explicitly provided with a formula in the paper, but it is implicitly calculated as: $ \text{Test Case Execution Rate} = \frac{\text{Total Test Cases Executed}}{\text{Total Fuzzing Time}} $
    • Symbol Explanation:
      • Total Test Cases Executed: The cumulative number of unique input test cases that have been run against the target program during the fuzzing campaign.
      • Total Fuzzing Time: The total duration for which the fuzzer was actively running and executing test cases.
  • Code Coverage Improvement:

    • Conceptual Definition: This metric measures the proportion of a program's code that has been exercised by the fuzzer's test cases. In coverage-guided fuzzing, the goal is to maximize code coverage, as exploring more code paths increases the likelihood of finding bugs. The paper uses edge coverage, which means tracking if specific transitions between basic blocks in the control-flow graph have been taken.
    • Mathematical Formula: Not explicitly provided with a formula in the paper, but for edge coverage, it is typically: $ \text{Edge Coverage} = \frac{\text{Number of Unique Edges Hit}}{\text{Total Number of Edges in Program}} \times 100% $
    • Symbol Explanation:
      • Number of Unique Edges Hit: The count of distinct control flow edges that have been traversed by at least one test case.
      • Total Number of Edges in Program: The total count of all possible control flow edges within the program's control-flow graph.
    • The paper specifies that both ClosureX and AFL++AFL++ use the same hitcount-based edge coverage collection implementation, loosely based on LLVM's Sanitizer Coverage Guards.
  • Time-to-bug Improvement:

    • Conceptual Definition: This metric measures how quickly a fuzzer can discover a specific bug (or a set of bugs). It reflects the practical value and efficiency of a fuzzer in a real-world context. A faster time-to-bug means less computational resources and time are required to identify critical vulnerabilities.
    • Mathematical Formula: Not explicitly provided with a formula, but it is typically measured as the elapsed time from the start of fuzzing until a specific bug is first detected.
    • Symbol Explanation:
      • Time (sec): The time in seconds from the start of the fuzzing campaign until a particular bug is identified.
      • (Number of trials): The count of independent fuzzing runs (out of total trials, e.g., 5 in this evaluation) in which a specific bug was found. This indicates the consistency of bug discovery.
  • Correctness (Semantic Equivalence):

    • Conceptual Definition: This is a qualitative but rigorously evaluated criterion. It ensures that running a test case under ClosureX's persistent mode yields the same behavior (program state and execution path) as running it in complete isolation within a fresh process. This is broken down into two aspects:
      • Dataflow Equivalence: Verifies that all relevant program state (heap, global memory) is identical at the end of execution, regardless of whether it's run in ClosureX or a fresh process.
      • Control-flow Equivalence: Verifies that the program's execution path (sequence of basic blocks/edges traversed) is identical, regardless of the execution environment.

5.3. Baselines

The paper compares ClosureX against the process management mechanism of the most popular Linux fuzzer, AFL++AFL++.

  • AFL++AFL++ Forkserver: Specifically, ClosureX is benchmarked against AFL++AFL++'s forkserver mode. This is a crucial choice because the forkserver is considered the fastest correct process management mechanism in traditional fuzzing, as it avoids the full overhead of process creation for each test case while still providing isolation via fork() and copy-on-write. By outperforming AFL++AFL++'s forkserver, ClosureX demonstrates a superior approach.
  • Same Coverage and Seed Mutation: To isolate the impact of ClosureX's persistent execution mechanism, both ClosureX instrumented targets and AFL++AFL++ were configured to use the same coverage tracing and seed mutation mechanisms. This ensures a fair comparison focused solely on the process management improvements.

5.4. Evaluation Setup

  • Environment: All experiments were performed on Microsoft Azure cloud instances.
    • Type: Standard_DS1_v2
    • Memory: 3.5GB3.5\mathrm{GB}
    • CPU: Single CPU core
    • Operating System: Ubuntu 20.04 LTS
  • Statistical Rigor: To account for randomness inherent in fuzzing and ensure statistical significance, 5 independent 24-hour trials were conducted for each benchmark and fuzzer configuration (ClosureX vs. AFL++AFL++). This helps in generating reliable average results and allows for statistical tests like the Mann-Whitney U-test to determine if observed differences are statistically significant.

6. Results & Analysis

The evaluation aims to show that ClosureX provides the performance benefits of persistent fuzzing while maintaining the correctness of fresh process fuzzing. The results are presented across three main categories: test case execution rate, code coverage improvement, and time-to-bug improvement, along with a detailed correctness validation.

6.1. Core Results Analysis

6.1.1. Test Case Execution Rate

The test case throughput is a critical factor for effective fuzzing. ClosureX significantly improves this by isolating process initialization, loading, and tear-down from the main fuzzing loop. The comparison is against AFL++AFL++'s forkserver mechanism, which is the state-of-the-art for correct process management in Linux fuzzing.

The following are the results from Table 5 of the original paper, showing the average number of test cases executed in 24 hours:

Benchmark CLOSUREX AFL++ Speedup ρ value
bsdtar 379M 93M 4.09 0.0079
libpcap 565M 201M 2.81 0.0079
gpmf-parser 456M 193M 2.36 0.0079
libbpf 884M 212M 4.6 0.0079
freetype 493M 168M 2.94 0.0079
giftext 1118M 233M 4.79 0.0079
zlib 1117M 293M 4.00 0.0079
libdwarf 913M 265M 3.44 0.0079
c-blosc2 905M 213M 4.24 0.0079
md4c 538M 215M 2.50 0.0079
Average 3.53 0.0079

Analysis:

  • ClosureX consistently outperforms AFL++AFL++ in terms of test case execution rate across all benchmarks.
  • On average, ClosureX executes over 3.53X3.53\mathrm{X} more test cases than AFL++AFL++ within a 24-hour period.
  • The speedup ranges from 2.36X2.36\mathrm{X} (gpmf-parser) to 4.79X4.79\mathrm{X} (giftext), demonstrating significant and consistent performance gains.
  • The ρ\rho value (from the Mann-Whitney U-test) for all benchmarks is 0.0079, which is well below 0.05. This indicates that the observed performance improvement for ClosureX is statistically significant and not due to random chance. This strongly validates ClosureX's efficiency gains derived from its fine-grain state restoration and elimination of process management overhead.

6.1.2. Code Coverage Improvement

Code coverage is a crucial metric for coverage-guided fuzzing, indicating the fuzzer's ability to explore deeper and more complex logic. While ClosureX and AFL++AFL++ use the same underlying coverage mechanism, ClosureX's higher throughput can translate to better coverage by exploring more test cases.

The following are the results from Table 6 of the original paper, showing the edge coverage percentage:

Benchmark CLOSUREX AFL++ % Improvement ρ value
bsdtar 18.13% 13.80% 31.4 0.031
libpcap 15.64% 15.22% 2.76 0.547
gpmf-parser 14.43% 13.82% 4.41 0.222
libbpf 6.45% 6.34% 1.8 0.079
freetype 16.30% 16.11% 1.17 0.150
giftext 27.57% 27.21% 1.32 0.003
zlib 36.22% 35.91% 0.87 0.111
libdwarf 5.20% 5.04% 3.34 0.020
c-blosc2 2.12% 1.76% 20.41 0.093
md4c 82.17% 82.08% 0.11 0.010
Average 7.8 0.079

Analysis:

  • On average, ClosureX achieves 7.8%7.8\% more edge coverage than AFL++AFL++.
  • While the percentage improvement might seem modest for some benchmarks, for others like bsdtar (31.4%31.4\%) and c-blosc2 (20.41%20.41\%), the improvement is substantial.
  • The Mann-Whitney U-test indicates statistically significant improvement (ρ<0.05\rho < 0.05) in five out of ten benchmarks (bsdtar, giftext, libdwarf, md4c), confirming that the increased execution rate translates into more effective code exploration in a statistically significant way for a good portion of the targets. For others, while ClosureX still provides better coverage, the difference isn't always statistically significant within the 24-hour trial period.

6.1.3. Time-to-bug Improvement

The ultimate practical value of a fuzzer lies in its ability to find bugs quickly and consistently. The evaluation measured the time taken by AFL++AFL++ and ClosureX to find specific bugs.

The following are the results from Table 7 of the original paper, showing time to find bugs and consistency:

Benchmark CLOSUREX AFL++ Bug Type
c-blosc2 7148 (4) 11896 (2) Null Ptr Deref.
c-blosc2 25358 (4) 12471 (2) Null Ptr Deref.
c-blosc2 7442 (4) 18299 (2) Null Ptr Deref.
c-blosc2 22957 (1) 16097 (1) Null Ptr Deref.
gpmf-parser 798 (5) 1386 (5) Division by Zero
gpmf-parser 43178 (3) 2365 (1) Unaddressable Access
gpmf-parser 2368 (5) 20340 (5) Unaddressable Access
gpmf-parser 33028 (5) 22721 (5) Division by Zero
gpmf-parser 6144 (5) 6238 (4) Invalid Write
gpmf-parser 14025 (2) 5493 (2) Invalid Read
libbpf 61 (5) 91 (5) Null Ptr Deref.
libbpf 515 (5) 552 (5) Null Ptr Deref.
libbpf 2278 (5) 9964 (4) Null Ptr Deref.
md4c 8489 (5) 22254 (4) Memcpy with negative size
md4c 44714 (3) 54342 (2) Array out of bounds access

Analysis:

  • Overall Speedup: When both fuzzers find the same bugs, ClosureX does so approximately 1.9X1.9\mathrm{X} faster than AFL++AFL++. This is evident in many entries where ClosureX's time is significantly lower.
  • Consistency: ClosureX also finds bugs more consistently. It found bugs in 25%25\% more trials than AFL++AFL++. For example, in c-blosc2, ClosureX found bugs in 4 trials where AFL++AFL++ only found them in 2.
  • Unique Bug Discoveries: The evaluation led to the discovery of 15 0-day bugs across 4 programs (c-blosc2, gpmf-parser, libbpf, md4c). These bugs resulted in 4 assigned CVEs (CVE-2023-37185, CVE-2023-37186, CVE-2023-37187, CVE-2023-37188) and 7 patches, underscoring the real-world impact and effectiveness of ClosureX.
  • Bug Example: A notable bug was a Null Pointer Dereference in libbpf, a Linux kernel library. The bug occurred when the library attempted to parse the relocation section of a crashing ELF object, leading to a NULL pointer being accessed. ClosureX found this bug very quickly (e.g., 61 seconds in 5 trials).
  • Specific cases: While ClosureX generally outperforms, there are a few instances where AFL++AFL++ found a bug faster or with similar consistency (e.g., gpmf-parser, bug #2, and gpmf-parser, bug #4). However, the overall trend clearly favors ClosureX.

6.1.4. ClosureX Correctness

The crucial claim of ClosureX is maintaining semantic correctness. This was validated by ensuring dataflow equivalence and control-flow equivalence to fresh process execution.

  • Methodology for Correctness Validation:

    1. A comprehensive test case queue (all inputs accumulated during fuzzing) was used for each target.
    2. For each input in the queue, two state snapshots were compared:
      • Snapshot 1 (Ground Truth): Taken after executing the target in a fresh process.
      • Snapshot 2 (ClosureX): Taken after executing the test case in ClosureX's persistent mode, but after 1000 iterations of other randomly selected test cases from the queue. This simulates a heavily "polluted" state to thoroughly test restoration.
    3. If the two snapshots were identical, ClosureX was deemed correct for that test case.
  • Dataflow Equivalence:

    • Program Stack: Confirmed to be fresh due to ClosureX's harness (normal returns or longjmp).
    • Dynamic Memory (Heap):
      • Valgrind (a debugging and profiling tool for memory errors like double-free or use-after-free) was used. No such inconsistencies were found for any queue input across benchmarks.
      • Memory usage of the target (excluding ClosureX's own memory) was identical to a fresh process execution, confirming successful freeing of dynamically allocated memory.
    • Global State: Global state snapshots were compared between fresh process and ClosureX execution.
      • Non-determinism Handling: To account for natural non-determinism (e.g., storing heap addresses, PRNG output), ground-truth non-deterministic bytes were identified by running fresh process executions multiple times and noting variations. These variations were excluded from the comparison.
      • Result: All targets, for all fuzzing queue entries, showed identical snapshots to a fresh process execution (after accounting for non-determinism). This strongly establishes dataflow equivalence.
  • Control-flow Equivalence:

    • Methodology:

      1. Ground-truth path-sensitive edge-coverage was obtained by running a test case in a fresh process.
      2. Then, in ClosureX persistent mode, 1000 randomly selected queue inputs were executed, followed by the test case of interest. Its path-sensitive edge-coverage was recorded.
      3. The two coverage datasets were compared.
    • Non-determinism Handling: Similar to dataflow, test cases inducing non-deterministic execution paths in multiple fresh process runs were flagged and excluded. This was only observed in the freetype benchmark, suspected to be due to a PRNG affecting control flow.

    • Result: No inconsistencies in control-flow equivalence were found for any queue entry compared to fresh process execution.

      Conclusion on Correctness: Given no deviation in data- or control-flow behavior (after accounting for natural non-determinism), the authors claim that ClosureX ensures semantic consistency, meaning test cases behave exactly as if they were executed in an isolated, fresh process.

7. Conclusion & Reflections

7.1. Conclusion Summary

The paper successfully demonstrates that it is possible to combine the high performance of persistent fuzzing with the correctness of fresh process fuzzing. ClosureX achieves this by introducing a compiler-level program instrumentation approach that creates naturally restartable programs. This enables an entire fuzzing campaign to run within a single process, eliminating the significant overhead associated with process management.

Through its fine-grain state tracking and restoration mechanisms (for stack, heap, global memory, and file descriptors), ClosureX ensures semantic correctness, meaning each test case executes as if it were in an isolated environment. The evaluation shows that ClosureX significantly increases the test case execution rate (over 3.5X3.5\mathrm{X} faster on average) and accelerates bug discovery (1.9X1.9\mathrm{X} faster, finding 15 0-day bugs and 4 CVEs) compared to AFL++AFL++'s forkserver mode. Furthermore, ClosureX's approach is independent of specific operating system features, making it highly portable. The work highlights the untapped potential for fuzzer improvements through more advanced, fuzzing-specific compiler analysis.

7.2. Limitations & Future Work

The authors acknowledge several limitations and propose future research directions:

  • Deferred Initialization: Many programs have an initialization phase that is independent of user input and ideally needs to be run only once. ClosureX currently re-executes this for each test case. Future work could involve more aggressive code motion and flow analysis to hoist such initialization steps out of the fuzzing loop into a deferred initialization point within the harness. This would further improve performance.
  • Supporting Other Operating Systems: While ClosureX is implemented for Linux, the authors state its approach is operating system agnostic. Extending it to other OSes (like Windows) would require minimal engineering effort, primarily involving hooking OS-specific memory allocation functions (HeapAlloc, VirtualAlloc) and similar I/O primitives, while the LLVM passes for C language APIs and global state restoration remain largely unchanged. They anticipate similar or better results on Windows due to its lack of copy-on-write based fork().
  • Extending ClosureX to Complex Program States:
    • Multi-threaded Programs: If child threads continue executing after the main thread completes an iteration in persistent mode (whereas the OS would kill them in a fresh process), ClosureX would need to track created threads and kill any remaining ones between test cases. This is analogous to how dynamic memory is handled.
    • Source-unavailable Libraries: ClosureX relies on compile-time instrumentation, meaning it needs access to the source code of the components being tested. It cannot currently handle third-party pre-compiled libraries.
    • Stateful Programs (Storing/Retrieving State from Files): The current focus is on stateless programs that take input from command line or files. ClosureX does not explicitly address the complexities of fuzzing programs that maintain persistent state across runs (e.g., by writing to and reading from external files). This is an emerging area of fuzzing research.
    • Custom Memory Allocators: ClosureX currently hooks standard malloc-family functions. Programs using custom memory allocators would require a more target-tailored approach for heap restoration.

7.3. Personal Insights & Critique

This paper presents a highly impactful and elegant solution to a fundamental problem in fuzzing. The innovation of using compiler-level instrumentation to achieve fine-grain, automated state restoration is a significant leap forward, effectively dissolving the long-standing trade-off between fuzzing speed and correctness.

Strengths and Inspirations:

  • Elegant Problem Solving: The use of LLVM passes to transparently transform a program into a naturally restartable entity is clever. It shifts the burden of state management from the runtime environment (OS, fuzzer) to the compilation process, where more static analysis and optimization opportunities exist.
  • Portability and Maintainability: By largely relying on LLVM IR and standard C language features (setjmp/longjmp), ClosureX offers superior portability compared to kernel-based solutions. This is crucial for real-world adoption and long-term viability.
  • Orthogonality: The design allows ClosureX to be combined with other fuzzing advancements (e.g., advanced coverage tracking, sanitizers, mutational strategies) for cumulative benefits, making it a foundational improvement rather than a competing one. This modularity is a key strength.
  • Real-world Impact: The discovery of 15 0-day bugs and 4 CVEs is compelling evidence of ClosureX's practical value and effectiveness.

Potential Issues and Areas for Improvement:

  • Complexity of longjmp: While setjmp/longjmp is effective for non-local jumps, its usage can sometimes make code harder to reason about, especially for maintainers unfamiliar with it. However, in ClosureX, it's confined to the harness and exitHook, limiting its complexity.
  • Handling Source-Unavailable Libraries: This is a common challenge for source-instrumentation approaches. For targets that heavily rely on closed-source third-party libraries, ClosureX's current method of state tracking and restoration might be incomplete. Hybrid approaches combining binary instrumentation for libraries with source instrumentation for the main target could be an interesting future direction.
  • Stateful Programs: The paper explicitly acknowledges that ClosureX focuses on stateless programs. Fuzzing stateful applications (e.g., databases, network servers with persistent connections) is a harder problem. While ClosureX addresses in-memory state, external file or network state management would require deeper semantic understanding of the application, potentially beyond compiler-level generic hooks.
  • Performance Overhead of Hooks: Although the paper demonstrates significant speedup, there's always a small overhead introduced by the myMalloc, myFree, fopen_hook, etc., wrappers compared to direct calls. The net gain is huge, but it's worth noting the instrumentation itself isn't zero-cost.
  • Scalability of GlobalPass for Large Programs: While closure_global_section reduces the restored region, for exceptionally large programs with a vast number of modifiable global variables, the snapshotting and restoration of this section might still introduce noticeable overhead. Compiler optimizations might mitigate this to some extent.

Broader Applicability: The concept of creating naturally restartable programs has implications beyond fuzzing. It could be valuable for other scenarios requiring repeated execution from a clean state:

  • Automated Testing Frameworks: Speeding up unit or integration tests that require resetting environmental conditions.

  • Simulation and Emulation: For systems that need to frequently reset their state for iterative simulations or experiments.

  • Fault Injection: Quickly re-running components with different fault conditions.

    Overall, ClosureX represents a robust and highly practical advancement in fuzzing technology, demonstrating how compiler support can fundamentally enhance the capabilities of security testing tools.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.