details left ambiguous to prevent pre-analysis

Novel Techniques

Witness

A collection of novel architectural implementations within Witness.

Binary Impersonation

The protected binary's PE structure is reshaped to match the forensic fingerprint of a different commercial protector. Automated classifiers, YARA rules, and triage tools misidentify the protection technology and direct analysts toward the wrong toolchain.

No other protector can impersonate 10+ different products on demand. The analyst's first step, identifying the protection technology, yields the wrong answer. Every tool, script, and heuristic they apply afterward is calibrated for a product that isn't present.

Anti-AI Analysis

Modern reverse engineering increasingly relies on neural decompilers, LLM-assisted analysis, and GNN-based program understanding. Multiple adversarial primitives are injected specifically to degrade AI-assisted reverse engineering.

Primitive 01

Attention Saturation

Targets: Transformer-based decompilers, LLM analysis

Exploits the quadratic scaling of transformer self-attention. Semantically meaningless operations exhaust the model's attention budget, crowding actual logic beyond the attention horizon.

Primitive 02

Graph Poisoning

Targets: GNN-based program analysis, data flow recovery

Injects phantom dependency edges that appear structurally identical to real data flows but carry no semantic meaning. Graph neural networks trained on data flow edges are poisoned with high-confidence false information.

Primitive 03

Context Window Exhaustion

Targets: Sequence models, RNN/LSTM decompilers

Forces models to maintain long-range context across function boundaries for dependencies that ultimately contribute nothing. Hidden-state capacity is consumed by algebraic noise, degrading accuracy on real dependencies.

Primitive 04

Embedding Disruption

Targets: Token embeddings, call graph models, positional encodings

Coordinated sub-strategies that corrupt the model's internal representation at the token, function, and structural levels simultaneously. Embedding vectors for real code are shifted toward adversarial regions of the representation space.

These primitives are injected before virtualization. The VM then encrypts and fragments them further. An LLM analyzing the final binary faces adversarial inputs designed specifically for neural architectures, wrapped inside a custom encrypted ISA it has never seen in training data.

Combined with binary impersonation, every LLM tested to date misidentifies the protection technology 100% of the time, directing its entire analysis toward the wrong product.

Per-Build Algorithm Synthesis

Traditional protectors use fixed algorithms with per-build keys. Witness synthesizes the algorithms themselves. The cipher, the PRF, and the ISA encoding are all generated fresh on every compilation.

Component	Traditional Protectors	Witness
Stream Cipher	Fixed algorithm, per-build key	Entire construction synthesized per build. Internal structure, round count, and data routing are all unique
PRF Construction	Fixed structure (HMAC or similar)	Internal wiring synthesized per build. Not a parameterization of a known PRF
Hash Function	Fixed (SHA-256, CRC32, etc.)	Per-build constants and internal structure. A different hash on every compilation
PRNG	Fixed (Mersenne Twister, etc.)	Per-build parameterization that changes the algorithm's behavior, not just its seed
Opcode Encoding	Fixed opcode table, possibly permuted	Multiple encoding transforms applied per opcode. The bytecode format itself is unique
Dispatch Table	Fixed structure, encrypted entries	Self-mutating. Re-encrypted after every dispatch with per-build seeding

An attacker who fully reverse-engineers one build learns nothing transferable to the next build. The algorithms themselves are different. There is no "Witness cipher" to study. Every binary contains a cipher that has never existed before and will never exist again.

Language-Agnostic Protection

Operating at the LLVM IR level means any language that compiles through LLVM is natively supported. No source rewriting, no language-specific hooks, no binary rewriting.

clang

C++

clang++

Rust

rustc (LLVM)

Zig

zig cc

ldc2

Fortran

flang

Objective-C

clang

Comparison with other protectors

VMProtect / Themida Binary-level. Post-compilation, no language awareness

Denuvo Binary-level. Post-compilation instrumentation and VM injection

Obfuscator-LLVM LLVM-level, but no virtualization (native-code transforms only). Systematically defeated by D810, Miasm, and angr. Forks reach LLVM 17-21

Witness LLVM 22.1.x plugin. Any LLVM frontend, full ISA, C++ exceptions, atomics, SIMD

Cryptographic Entanglement

Protection doesn't end at function boundaries. Cryptographic state flows between functions, between invocations, and along execution paths, creating dependencies that cannot be severed without breaking everything at once.

Cross-Function Binding

When multiple functions are protected, they share a persistent cryptographic accumulator. Each function's execution folds a fingerprint into this shared state. Tampering with function A corrupts the decryption keys of functions B, C, and every other entangled function. Silently.

No known prior art

Path-Sensitive State Binding

The execution path is folded into the decryption key stream. Reaching the same point via different paths produces different keys. Forcing an alternate path doesn't just skip a check. It makes every subsequent instruction decrypt to garbage.

Advances environmental keying

Cross-Invocation Mutation

After each function invocation, the entire bytecode segment is re-encrypted with a fresh key. A memory dump captured during one call produces bytecode that is invalid for the next call. The binary is literally different every time it runs.

No known prior art

Load-Bearing Integrity

Integrity verification is not a removable check. The values produced during verification are mathematically required for correct decryption of subsequent instructions. Patching out a check doesn't skip it. It corrupts all downstream computation with no error messages.

No known prior art

Algebraic Bytecode Encoding

Values in the VM don't just get XOR-masked. They pass through multiple layers of algebraic encoding, each operating in a different mathematical domain. Stripping one layer reveals another that requires entirely different analysis techniques.

Outermost Layer

Memory Layout Randomization

The VM register file is scattered across disjoint memory regions with a per-invocation random permutation. The memory layout is different every time the function is called.

Value Domain Encoding

Every value in the data segment is encrypted with a position-dependent block cipher. The same value at different offsets produces different ciphertext. Keys rotate on every instruction dispatch via a one-way ratchet, defeating black-box synthesis attacks that need multiple I/O pairs per bijection.

Secret-Shared Bytecode

Bytecode is split into multiple secret shares using a threshold scheme. Shares are proactively re-randomized at execution boundaries without changing the underlying secret. Phantom reads ensure uniform memory access patterns.

Algebraic Matrix Encoding

Arithmetic operations are encoded as matrix-vector multiplications in an algebraic number system. Consecutive operations are composed into a single matrix multiply, making it impossible to identify individual instructions from the bytecode.

Access Pattern Uniformization

Every real memory access is accompanied by phantom read-writes at unpredictable offsets. Both read and write patterns are uniformized: phantom writes perform a deterministic decrypt-reencrypt round-trip, producing identical ciphertext. Cache-timing and memory-trace side channels see only uniform access patterns.

Innermost Layer (plaintext exists only during handler execution)

Each layer operates in a different mathematical domain. Stripping one reveals another that requires entirely different analysis techniques. Values are only meaningful within the context of a specific execution path.

Architectural Innovations

Novel structural techniques in the VM runtime that break assumptions made by dynamic analysis tools, debuggers, and memory forensics.

Novel

Split-Context VM Execution

The VM is split into two cooperative execution contexts: one that fetches and pre-decrypts bytecode, and one that executes it. The two contexts alternate. Single-stepping one shows no operations from the other. Neither context alone reveals the program's behavior.

Novel

White-Box Key Encoding

The root secret key is encoded in multi-layer lookup tables with affine encoding. A hardware-derived random mask is applied at every lookup, preventing Differential Power Analysis from correlating lookup indices with key bytes. The key never exists as a contiguous plaintext array in memory.

Novel

Call Stack Spoofing

The VM rewrites its call chain so stack unwinders see a clean, legitimate-looking backtrace. Synthetic unwind metadata is generated so the spoofed stack passes both debugger inspection and OS exception dispatch.

Novel

Memory-Hard Key Derivation

Key derivation uses a memory-hard function with a large working buffer and data-dependent access patterns. This forces hardware side-channel attackers to contend with unpredictable memory behavior that defeats cache-line monitoring.

Novel

Recursive Self-Virtualization

Functions can be virtualized multiple times, creating nested VMs. Each layer has independent encryption keys, independent dispatch tables, and independent handler variants. An attacker who defeats the outer VM faces a completely new inner VM with a different ISA, different algorithms, and different keys.

Novel

Surgical Opcode Subsetting

The VM instruction set is pruned per-build. Handlers for opcodes the protected code never uses are completely stripped from the binary. Each build contains a unique, minimal handler set. There is no universal handler catalog to study, because every binary's VM has a different instruction set.

Novel

Full C++ Exception Virtualization

C++ try/catch/throw execute entirely within the VM. Every other commercial protector exits to native code for exception handling, exposing control flow at exception boundaries. Witness keeps exception dispatch, stack unwinding, and catch matching inside the encrypted interpreter. No native exception frames are visible.

Novel

Variadic Function Virtualization

Variadic calling conventions are virtualized entirely inside the VM. The argument list is captured at function entry and stored in the encrypted data segment. Argument reads go through double-indirection within the interpreter. Every other protector exits to native code for variadic calls, exposing arguments and calling conventions on the real stack.

Novel

Load-Bearing Phantom Handlers

The dispatch table contains handlers that are never the target of valid bytecode but are cryptographically entangled with the real dispatch chain. Removing or modifying a phantom handler corrupts the encryption state for all real handlers. They cannot be identified via execution tracing (never dispatched) or stripped via dead code elimination (load-bearing).

Novel

Self-Mutating Dispatch Table

Handler positions in the dispatch table are physically shuffled at runtime after every dispatch. A memory dump captured at one point in execution shows a completely different table layout than a dump captured moments later. All entries are re-encrypted with fresh per-position keys periodically. No other protector mutates the physical layout of its handler table during execution.

Novel

Page-Granular Key Derivation

Each 4KB code page is encrypted with its own unique key derived from a master key and the page index. Pages are decrypted on-demand via exception handler when first accessed. The derived key is never stored after use. Compromising one decrypted page reveals nothing about the keys of other pages. No other protector combines per-page key derivation with on-demand exception-driven decryption.

Post-Quantum Cryptography

The key derivation pipeline uses both classical and post-quantum primitives. Even a quantum computer capable of breaking elliptic-curve cryptography cannot extract the root key without also breaking a lattice-based KEM.

Classical

Elliptic Curve

Ephemeral key exchange for hardware-bound per-function share encryption. Private key material never leaves the trusted execution boundary.

ECDH + KDF

Memory-Hard

Memory-Hard KDF

Large working buffer with data-dependent access patterns. Resists GPU/ASIC acceleration and cache-timing side channels from DCA/DFA attacks on white-box key extraction.

Data-dependent access · Explicit zeroing

Post-Quantum

Lattice-Based KEM

A post-quantum key encapsulation mechanism augments the root key after memory-hard stretching. Provides security against quantum adversaries at the highest standardized level.

NIST PQC Standard

The root key derivation pipeline chains multiple cryptographic stages. An attacker must defeat white-box encoding, memory-hard stretching, and a post-quantum KEM, simultaneously, in the correct order.