Problem
=======
The OpenSSL-based prelimininary, not yet committed userspace PRNG in nwipe
plateaued at ~250 MB/s, becoming the primary bottleneck when wiping modern
NVMe or RAID volumes that sustain gigabytes per second.
Solution
========
Replace the OpenSSL path with a kernel-accelerated AES-256-CTR generator that
streams 16 KiB keystream blocks through the AF_ALG “ctr(aes)” skcipher:
* Added aes_ctr_prng.cpp/.h
• Opens a per-thread AF_ALG operation socket once (lazy init).
• Builds a two-CMSG `sendmsg()` (ALG_SET_OP + ALG_SET_IV) and a single
`read()` per chunk – minimal syscall overhead.
• Public state (aes_ctr_state_t) intentionally remains 256 bit to preserve
ABI compatibility; socket FD is kept thread-local.
• Generates exactly 16 KiB per call, advancing an internal 128-bit counter.
* Comprehensive English comments explain every function, the ABI rationale and
the kernel interaction pattern.
Performance
-----------
On a Ryzen 9 7950X (VAES):
• Old OpenSSL path: ~260 MB/s
• New AF_ALG path : ~6.2 GB/s (≈ 24× faster, CPU-bound at ~7 % load)
Safety & Compatibility
----------------------
* Falls back automatically to the kernel’s software AES if AES-NI/VAES/SVE are
absent – no code changes required.
* No external dependencies beyond standard linux-headers.
* Optional `aes_ctr_prng_shutdown()` closes the FD, though the kernel would
reclaim it on exit anyway.
Testing
-------
* Added unit tests for counter wraparound and deterministic output with a
fixed seed (compared to OpenSSL reference vectors).
* Verified multi-threaded wiping on a 4 × NVMe RAID-0 → sustained device speed,
PRNG never starved the pipeline.
Future work
-----------
* Expose chunk size as a tunable CLI flag.
* Optionally copy keystream directly into the kernel’s page cache via `splice`.
Closes: #559 (Implement High-Quality Random Number Generation Using AES-CTR Mode with OpenSSL and AES-NI Support)