Using Syscalls to Inject Shellcode on Windows

After learning how to write shellcode injectors in C via the Sektor7 Malware Development Essentials course, I wanted to learn how to do the same thing in C#. Writing a simple injector that is similar to the Sektor7 one, using P/Invoke to run similar Win32 API calls, turns out to be pretty easy. The biggest difference I noticed was that there was not a directly equivalent way to obfuscate API calls. After some research and some questions on the BloodHound Slack channel (thanks @TheWover and @NotoriousRebel!), I found there are two main options I could look into. One is using native Windows system calls (AKA syscalls), or using Dynamic Invocation. Each have their pros and cons, and in this case the biggest pro for syscalls was the excellent work explaining and demonstrating them by Jack Halon (here and here) and badBounty. Most of this post and POC is drawn from their fantastic work on the subject. I know TheWover and Ruben Boonen are doing some work on D/Invoke, and I plan on digging into that next.

I want to mention that a main goal of this post is to serve as documentation for this proof of concept and to clarify my own understanding. So while I’ve done my best to ensure the information here is accurate, it’s not guaranteed to be 100%. But hey, at least the code works.

Said working code is available here

Native APIs and Win32 APIs

To begin, I want to cover why we would want to use syscalls in the first place. The answer is API hooking, performed by AV/EDR products. This is a technique defensive products use to inspect Win32 API calls before they are executed, determine if they are suspicious/malicious, and either block or allow the call to proceed. This is done by slightly the changing the assembly of commonly abused API calls to jump to AV/EDR controlled code, where it is then inspected, and assuming the call is allowed, jumping back to the code of the original API call. For example, the CreateThread and CreateRemoteThread Win32 APIs are often used when injecting shellcode into a local or remote process. In fact I will use CreateThread shortly in a demo of injection using strictly Win32 APIs. These APIs are defined in Windows DLL files, in this case MSDN tells us in Kernel32.dll. These are user-mode DLLs, which mean they are accessible to running user applications, and they do not actually interact directly with the operating system or CPU. Win32 APIs are essentially a layer of abstraction over the Windows native API. This API is considered kernel-mode, in that these APIs are closer to the operating system and underlying hardware. There are technically lower levels than this that actually perform kernel-mode functionality, but these are not exposed directly. The native API is the lowest level that is still exposed and accessible by user applications, and it functions as a kind of bridge or glue layer between user code and the operating system. Here’s a good diagram of how it looks:

Windows Architecture

You can see how Kernell32.dll, despite the misleading name, sits at a higher level than ntdll.dll, which is right at the boundary between user-mode and kernel-mode.

So why does the Win32 API exist? A big reason it exists is to call native APIs. When you call a Win32 API, it in turn calls a native API function, which then crosses the boundary into kernel-mode. User-mode code never directly touches hardware or the operating system. So the way it is able to access lower-level functionality is through native PIs. But if the native APIs still have to call yet lower level APIs, why not got straight to native APIs and cut out an extra step? One answer is so that Microsoft can make changes to the native APIs with out affecting user-mode application code. In fact, the specific functions in the native API often do change between Windows versions, yet the changes don’t affect user-mode code because the Win32 APIs remain the same.

So why do all these layers and levels and APIs matter to us if we just want to inject some shellcode? The main difference for our purposes between Win32 APIs and native APIs is that AV/EDR products can hook Win32 calls, but not native ones. This is because native calls are considered kernel-mode, and user code can’t make changes to it. There are some exceptions to this, like drivers, but they aren’t applicable for this post. The big takeaway is defenders can’t hook native API calls, while we are still allowed to call them ourselves. This way we can achieve the same functionality without the same visibility by defensive products. This is the fundamental value of system calls.

System Calls

Another name for native API calls is system calls. Similar to Linux, each system call has a specific number that represents it. This number represents an entry in the System Service Dispatch Table (SSDT), which is a table in the kernel that holds various references to various kernel-level functions. Each named native API has a matching syscall number, which has a corresponding SSDT entry. In order to make use of a syscall, it’s not enough to know the name of the API, such as NtCreateThread. We have to know its syscall number as well. We also need to know which version of Windows our code will run on, as the syscall numbers can and likely will change between versions. There are two ways to find these numbers, one easy, and one involving the dreaded debugger.

The first and easist way is to use the handy Windows system call table created by Mateusz “j00ru” Jurczyk. This makes it dead simple to find the syscall number you’re looking for, assuming you already know which API you’re looking for (more on that later).

WinDbg

The second method of finding syscall numbers is to look them up directly at the source: ntdll.dll. The first syscall we need for our injector is NtAllocateVirtualMemory. So we can fire up WinDbg and look for the NtAllocateVirtualMemory function in ntdll.dll. This is much easier than it sounds. First I open a target process to debug. It doesn’t matter which process, as basically all processes will map ntdll.dll. In this case I used good old notepad.

Opening Notepad in WinDbg

We attach to the notepad process and in the command prompt enter x ntdll!NtAllocateVirtualMemory. This lets us examine the NtAllocateVirtualMemory function within the ntdll.dll DLL. It returns a memory location for the function, which we examine, or unassemble, with the u command:

NtAllocateVirtualMemory Unassembled

Now we can see the exact assembly language instructions for calling NtAllocateVirtualMemory. Calling syscalls in assembly tends to follow a pattern, in that some arguments are setup on the stack, seen with the mov r10,rcx statement, followed by moving the syscall number into the eax register, shown here as mov eax,18h. eax is the register the syscall instruction uses for every syscall. So now we know the syscall number of NtAllocateVirtualMemory is 18 in hex, which happens to be the same value listed on in Mateusz’s table! So far so good. We repeat this two more times, once for NtCreateThreadEx and once for NtWaitForSingleObject.

Finding the syscall number for NtCreateThreadEx

Where are you getting these native functions?

So far the process of finding the syscall numbers for our native API calls has been pretty easy. But there’s a key piece of information I’ve left out thus far: how do I know which syscalls I need? The way I did this was to take a basic functioning shellcode injector in C# that uses Win32 API calls (named Win32Injector, included in the Github repository for this post) and found the corresponding syscalls for each Win32 API call. Here is the code for Win32Injector:

Win32Injector

This is a barebones shellcode injector that executes some shellcode to display a popup box:

Hello world from Win32Injector

As you can see from the code, the three main Win32 API calls used via P/Invoke are VirtualAlloc, CreateThread, and WaitForSingleObject, which allocate memory for our shellcode, create a thread that points to our shellcode, and start the thread, respectively. As these are normal Win32 APIs, they each have comprehensive documentation on MSDN. But as native APIs are considered undocumented, we may have to look elsewhere. There is no one source of truth for API documentation that I could find, but with some searching I was able to find everything I needed.

In the case of VirtualAlloc, some simple searching showed that the underlying native API was NtAllocateVirtualMemory, which was in fact documented on MSDN. One down, two to go.

Unfortunately, there was no MSDN documentation for NtCreateThreadEx, which is the native API for CreateThread. Luckily, badBounty’s directInjectorPOC has the function definition available, and already in C# as well. This project was a huge help, so kudos to badBounty!

Lastly, I needed to find documentation for NtWaitForSingleObject, which as you might guess, is the native API called by WaitForSingleObject. You’ll notice a theme where many native API calls are prefaced with “Nt”, which makes mapping them from Win32 calls easier. You may also see the prefix “Zw”, which is also a native API call, but normally called from the kernel. These are sometimes identical, which you will see if you do x ntdll!ZwWaitForSingleObject and x ntdll!NtWaitForSingleObject in WinDbg. Again we get lucky with this API, as ZwWaitForSingleObject is documented on MSDN.

I want to point out a few other good sources of information for mapping Win32 to native API calls. First is the source code for ReactOS, which is an open source reimplementation of Windows. The Github mirror of their codebase has lots of syscalls you can search for. Next is SysWhispers, by jthuraisamy. It’s a project designed to help you find and implement syscalls. Really good stuff here. Lastly, the tool API Monitor. You can run a process and watch what APIs are called, their arguments, and a whole lot more. I didn’t use this a ton, as I only needed 3 syscalls and it was faster to find existing documentation, but I can see how useful this tool would be in larger projects. I believe ProcMon from Sysinternals has similar functionality, but I didn’t test it out much.

Ok, so we have our Win32 APIs mapped to our syscalls. Let’s write some C#!

But these docs are all for C/C++! And isn’t that assembly over there…

Wait a minute, these docs all have C/C++ implementations. How do we translate them into C#? The answer is marshaling. This is the essence of what P/Invoke does. Marshaling is a way of making use of unmanaged code, e.g. C/C++, and using in a managed context, that is, in C#. This is easily done for Win32 APIs via P/Invoke. Just import the DLL, specify the function definition with the help of pinvoke.net, and you’re off to the races. You can see this in the demo code of Win32Injector. But since syscalls are undocumented, Microsoft does not provide such an easy way to interface with them. But it is indeed possible, through the magic of delegates. Jack Halon covers delegates really well here and here, so I won’t go too in depth in this post. I would suggest reading those posts to get a good handle on them, and the process of using syscalls in general. But for completeness, delegates are essentially function pointers, which allow us to pass functions as parameters to other functions. The way we use them here is to define a delegate whose return type and function signature matches that of the syscall we want to use. We use marshaling to make sure the C/C++ data types are compatible with C#, define a function that implements the syscall, including all of its parameters and return type, and there you have it!

Not quite. We can’t actually call a native API, since the only implementation of it we have is in assembly! We know its function definition and parameters, but we can’t actually call it directly the same way we do a Win32 API. The assembly will work just fine for us though. Once again, it’s rather simple to execute assembly in C/C++, but C# is a little harder. Luckily we have a way to do it, and we already have the assembly from our WinDbg adventures. And don’t worry, you don’t really need to know assembly to make use of syscalls. Here is the assembly for the NtAllocateVirtualMemory syscall:

NtAllocateVirtualMemory Assembly

As you can see from the comments, we’re setting up some arguments on the stack, moving our syscall number into the eax register, and using the magic syscall operator. At a low enough level, this is just a function call. And remember how delegates are just function pointers? Hopefully it’s starting to make sense how this is fitting together. We need to get a function pointer that points to this assembly, along with some arguments in a C/C++ compatible format, in order to call a native API.

Putting it all together

So we’re almost done now. We have our syscalls, their numbers, the assembly to call them, and a way to call them in delegates. Let’s see how it actually looks in C#:

NtAllocateVirtualMemory Code

Starting from the top, we can see the C/C++ definition of NtAllocateVirtualMemory, as well as the assembly for the syscall itself. Starting at line 38, we have the C# definition of NtAllocateVirtualMemory. Note that it can take some trial and error to get each type in C# to match up with the unmanaged type. We create a pointer to our assembly inside an unsafe block. This allows us to perform operations in C#, like operate on raw memory, that are normally not safe in managed code. We also use the fixed keyword to make sure the C# garbage collector does not inadvertently move our memory around and change our pointers. Once we have a raw pointer to the memory location of our shellcode, we need to change its memory protection to executable so it can be run directly, as it will be a function pointer and not just data. Note that I am using the Win32 API VirtualProtectEx to change the memory protection. I’m not aware of a way to do this via syscall, as it’s kind of a chicken and the egg problem of getting the memory executable in order to run a syscall. If anyone knows how to do this in C#, please reach out! Another thing to note here is that setting memory to RWX is generally somewhat suspicious, but as this is a POC, I’m not too worried about that at this point. We’re concerned with hooking right now, not memory scanning!

Now comes the magic. This is the struct where our delegates are declared:

Delegates Struct

Note that a delegate definition is just a function signature and return type. The implementation is up to us, as long as it matches the delegate definition, and it’s what we’re implementing here in the C# NtAllocateVirtualMemory function. At line 65 above, we create a delegate named assembledFunction, which takes advantage of the special marshaling function Marshal.GetDelegateForFunctionPointer. This method allows us to get a delegate from a function pointer. In this case, our function pointer is the pointer to the syscall assembly called memoryAddress. assembledFunction is now a function pointer to an assembly language function, which means we’re now able to execute our syscall! We can call assembledFunction delegate like any normal function, complete with arguments, and we will get the results of the NtAllocateVirtualMemory syscall. So in our return statement we call assembledFunction with the arguments that were passed in and return the result. Let’s look at where we actually call this function in Program.cs:

Calling NtAllocateMemory

Here you can see we make a call to NtAllocateMemory instead of the Win32 API VirtualAlloc that Win32Injector uses. We setup the function call with all the needed arguments (lines 43-48) and make the call to NtAllocateMemory. This returns a block of memory for our shellcode, just like VirtualAlloc would!

The remaining steps are similar:

Remaining Syscalls

We copy our shellcode into our newly-allocated memory, and then create a thread within our current process pointing to that memory via another syscall, NtCreateThreadEx, in place of CreateThread. Finally, we start the thread with a call to the syscall NtWaitForSingleObject, instead of WaitForSingleObject. Here’s the final result:

Hello World Shellcode

Hello world via syscall! Assuming this was some sort of payload running on a system with API hooking enabled, we would have bypassed it and successfully run our payload.

A note on native code

Some key parts of this puzzle I’ve not mentioned yet are all of the native structs, enumerations, and definitions needed for the syscalls to function properly. If you look at the screenshots above, you will see types that don’t have implementations in C#, like the NTSTATUS return type for all the syscalls, or the AllocationType and ACCESS_MASK bitmasks. These types are normally declared in various Windows headers and DLLs, but to use syscalls we need to implement them ourselves. The process I followed to find them was to look for any non-simple type and try to find a definition for it. Pinvoke.net was massively helpful for this task. Between it and other resources like MSDN and the ReactOS source code, I was able to find and add everything I needed. You can find that code in the Native.cs class of the solution here.

Wrapup

Syscalls are fun! It’s not every day you get to combine 3 different languages, managed and unmanaged code, and several levels of Windows APIs in one small program. That said, there are some clear difficulties with syscalls. They require a fair bit of boilerplate code to use, and that boilerplate is scattered all around for you to find like a little undocumented treasure hunt. Debugging can also be tricky with the transition between managed and unmanaged code. Finally, syscall numbers change frequently and have to be customized for the platform you’re targeting. D/Invoke seems to handle several of these issues rather elegantly, so I’m excited to dig into those more soon.