details left ambiguous to prevent pre-analysis
Novel Techniques

Witness

A collection of novel architectural implementations within Witness.

01

Binary Impersonation

The protected binary's PE structure is reshaped to match the forensic fingerprint of a different commercial protector. Automated classifiers, YARA rules, and triage tools misidentify the protection technology and direct analysts toward the wrong toolchain.

VMProtect
VM Protector
Themida
Packer + VM
Denuvo
DRM
CodeVirtualizer
VM Protector
Obsidium
Packer
UPX
Compressor
ASPack
Packer
MPRESS
Compressor
Enigma
Protector
Petite
Compressor
No other protector can impersonate 10+ different products on demand. The analyst's first step, identifying the protection technology, yields the wrong answer. Every tool, script, and heuristic they apply afterward is calibrated for a product that isn't present.
02

Anti-AI Analysis

Modern reverse engineering increasingly relies on neural decompilers, LLM-assisted analysis, and GNN-based program understanding. Multiple adversarial primitives are injected specifically to degrade AI-assisted reverse engineering.

Primitive 01
Attention Saturation
Targets: Transformer-based decompilers, LLM analysis
Exploits the quadratic scaling of transformer self-attention. Semantically meaningless operations exhaust the model's attention budget, crowding actual logic beyond the attention horizon.
Primitive 02
Graph Poisoning
Targets: GNN-based program analysis, data flow recovery
Injects phantom dependency edges that appear structurally identical to real data flows but carry no semantic meaning. Graph neural networks trained on data flow edges are poisoned with high-confidence false information.
Primitive 03
Context Window Exhaustion
Targets: Sequence models, RNN/LSTM decompilers
Forces models to maintain long-range context across function boundaries for dependencies that ultimately contribute nothing. Hidden-state capacity is consumed by algebraic noise, degrading accuracy on real dependencies.
Primitive 04
Embedding Disruption
Targets: Token embeddings, call graph models, positional encodings
Coordinated sub-strategies that corrupt the model's internal representation at the token, function, and structural levels simultaneously. Embedding vectors for real code are shifted toward adversarial regions of the representation space.
These primitives are injected before virtualization. The VM then encrypts and fragments them further. An LLM analyzing the final binary faces adversarial inputs designed specifically for neural architectures, wrapped inside a custom encrypted ISA it has never seen in training data.
Combined with binary impersonation, every LLM tested to date misidentifies the protection technology 100% of the time, directing its entire analysis toward the wrong product.
03

Per-Build Algorithm Synthesis

Traditional protectors use fixed algorithms with per-build keys. Witness synthesizes the algorithms themselves. The cipher, the PRF, and the ISA encoding are all generated fresh on every compilation.

Component Traditional Protectors Witness
Stream Cipher Fixed algorithm, per-build key Entire construction synthesized per build. Internal structure, round count, and data routing are all unique
PRF Construction Fixed structure (HMAC or similar) Internal wiring synthesized per build. Not a parameterization of a known PRF
Hash Function Fixed (SHA-256, CRC32, etc.) Per-build constants and internal structure. A different hash on every compilation
PRNG Fixed (Mersenne Twister, etc.) Per-build parameterization that changes the algorithm's behavior, not just its seed
Opcode Encoding Fixed opcode table, possibly permuted Multiple encoding transforms applied per opcode. The bytecode format itself is unique
Dispatch Table Fixed structure, encrypted entries Self-mutating. Re-encrypted after every dispatch with per-build seeding
An attacker who fully reverse-engineers one build learns nothing transferable to the next build. The algorithms themselves are different. There is no "Witness cipher" to study. Every binary contains a cipher that has never existed before and will never exist again.
04

Language-Agnostic Protection

Operating at the LLVM IR level means any language that compiles through LLVM is natively supported. No source rewriting, no language-specific hooks, no binary rewriting.

C
clang
C++
clang++
Rust
rustc (LLVM)
Zig
zig cc
D
ldc2
Fortran
flang
Objective-C
clang
Comparison with other protectors
VMProtect / Themida Binary-level. Post-compilation, no language awareness
Denuvo Binary-level. Post-compilation instrumentation and VM injection
Obfuscator-LLVM LLVM-level, but no virtualization (native-code transforms only). Systematically defeated by D810, Miasm, and angr. Forks reach LLVM 17-21
Witness LLVM 22.1.x plugin. Any LLVM frontend, full ISA, C++ exceptions, atomics, SIMD
05

Cryptographic Entanglement

Protection doesn't end at function boundaries. Cryptographic state flows between functions, between invocations, and along execution paths, creating dependencies that cannot be severed without breaking everything at once.

Cross-Function Binding
When multiple functions are protected, they share a persistent cryptographic accumulator. Each function's execution folds a fingerprint into this shared state. Tampering with function A corrupts the decryption keys of functions B, C, and every other entangled function. Silently.
No known prior art
Path-Sensitive State Binding
The execution path is folded into the decryption key stream. Reaching the same point via different paths produces different keys. Forcing an alternate path doesn't just skip a check. It makes every subsequent instruction decrypt to garbage.
Advances environmental keying
Cross-Invocation Mutation
After each function invocation, the entire bytecode segment is re-encrypted with a fresh key. A memory dump captured during one call produces bytecode that is invalid for the next call. The binary is literally different every time it runs.
No known prior art
Load-Bearing Integrity
Integrity verification is not a removable check. The values produced during verification are mathematically required for correct decryption of subsequent instructions. Patching out a check doesn't skip it. It corrupts all downstream computation with no error messages.
No known prior art
06

Algebraic Bytecode Encoding

Values in the VM don't just get XOR-masked. They pass through multiple layers of algebraic encoding, each operating in a different mathematical domain. Stripping one layer reveals another that requires entirely different analysis techniques.

Outermost Layer
Memory Layout Randomization
The VM register file is scattered across disjoint memory regions with a per-invocation random permutation. The memory layout is different every time the function is called.
Value Domain Encoding
Every value in the data segment is encrypted with a position-dependent block cipher. The same value at different offsets produces different ciphertext. Keys rotate on every instruction dispatch via a one-way ratchet, defeating black-box synthesis attacks that need multiple I/O pairs per bijection.
Secret-Shared Bytecode
Bytecode is split into multiple secret shares using a threshold scheme. Shares are proactively re-randomized at execution boundaries without changing the underlying secret. Phantom reads ensure uniform memory access patterns.
Algebraic Matrix Encoding
Arithmetic operations are encoded as matrix-vector multiplications in an algebraic number system. Consecutive operations are composed into a single matrix multiply, making it impossible to identify individual instructions from the bytecode.
Access Pattern Uniformization
Every real memory access is accompanied by phantom read-writes at unpredictable offsets. Both read and write patterns are uniformized: phantom writes perform a deterministic decrypt-reencrypt round-trip, producing identical ciphertext. Cache-timing and memory-trace side channels see only uniform access patterns.
Innermost Layer (plaintext exists only during handler execution)
Each layer operates in a different mathematical domain. Stripping one reveals another that requires entirely different analysis techniques. Values are only meaningful within the context of a specific execution path.
07

Architectural Innovations

Novel structural techniques in the VM runtime that break assumptions made by dynamic analysis tools, debuggers, and memory forensics.

Novel
Split-Context VM Execution
The VM is split into two cooperative execution contexts: one that fetches and pre-decrypts bytecode, and one that executes it. The two contexts alternate. Single-stepping one shows no operations from the other. Neither context alone reveals the program's behavior.
Novel
White-Box Key Encoding
The root secret key is encoded in multi-layer lookup tables with affine encoding. A hardware-derived random mask is applied at every lookup, preventing Differential Power Analysis from correlating lookup indices with key bytes. The key never exists as a contiguous plaintext array in memory.
Novel
Call Stack Spoofing
The VM rewrites its call chain so stack unwinders see a clean, legitimate-looking backtrace. Synthetic unwind metadata is generated so the spoofed stack passes both debugger inspection and OS exception dispatch.
Novel
Memory-Hard Key Derivation
Key derivation uses a memory-hard function with a large working buffer and data-dependent access patterns. This forces hardware side-channel attackers to contend with unpredictable memory behavior that defeats cache-line monitoring.
Novel
Recursive Self-Virtualization
Functions can be virtualized multiple times, creating nested VMs. Each layer has independent encryption keys, independent dispatch tables, and independent handler variants. An attacker who defeats the outer VM faces a completely new inner VM with a different ISA, different algorithms, and different keys.
Novel
Surgical Opcode Subsetting
The VM instruction set is pruned per-build. Handlers for opcodes the protected code never uses are completely stripped from the binary. Each build contains a unique, minimal handler set. There is no universal handler catalog to study, because every binary's VM has a different instruction set.
Novel
Full C++ Exception Virtualization
C++ try/catch/throw execute entirely within the VM. Every other commercial protector exits to native code for exception handling, exposing control flow at exception boundaries. Witness keeps exception dispatch, stack unwinding, and catch matching inside the encrypted interpreter. No native exception frames are visible.
Novel
Variadic Function Virtualization
Variadic calling conventions are virtualized entirely inside the VM. The argument list is captured at function entry and stored in the encrypted data segment. Argument reads go through double-indirection within the interpreter. Every other protector exits to native code for variadic calls, exposing arguments and calling conventions on the real stack.
Novel
Load-Bearing Phantom Handlers
The dispatch table contains handlers that are never the target of valid bytecode but are cryptographically entangled with the real dispatch chain. Removing or modifying a phantom handler corrupts the encryption state for all real handlers. They cannot be identified via execution tracing (never dispatched) or stripped via dead code elimination (load-bearing).
Novel
Self-Mutating Dispatch Table
Handler positions in the dispatch table are physically shuffled at runtime after every dispatch. A memory dump captured at one point in execution shows a completely different table layout than a dump captured moments later. All entries are re-encrypted with fresh per-position keys periodically. No other protector mutates the physical layout of its handler table during execution.
Novel
Page-Granular Key Derivation
Each 4KB code page is encrypted with its own unique key derived from a master key and the page index. Pages are decrypted on-demand via exception handler when first accessed. The derived key is never stored after use. Compromising one decrypted page reveals nothing about the keys of other pages. No other protector combines per-page key derivation with on-demand exception-driven decryption.
08

Post-Quantum Cryptography

The key derivation pipeline uses both classical and post-quantum primitives. Even a quantum computer capable of breaking elliptic-curve cryptography cannot extract the root key without also breaking a lattice-based KEM.

Classical
Elliptic Curve
Ephemeral key exchange for hardware-bound per-function share encryption. Private key material never leaves the trusted execution boundary.
ECDH + KDF
Memory-Hard
Memory-Hard KDF
Large working buffer with data-dependent access patterns. Resists GPU/ASIC acceleration and cache-timing side channels from DCA/DFA attacks on white-box key extraction.
Data-dependent access · Explicit zeroing
Post-Quantum
Lattice-Based KEM
A post-quantum key encapsulation mechanism augments the root key after memory-hard stretching. Provides security against quantum adversaries at the highest standardized level.
NIST PQC Standard
The root key derivation pipeline chains multiple cryptographic stages. An attacker must defeat white-box encoding, memory-hard stretching, and a post-quantum KEM, simultaneously, in the correct order.