Building a Compact XOR Encoder for Shellcode: A Step-by-Step Assembly Guide

If you’ve been through the [Worldmail exploit write-up]({% post_url 2020-05-09-worldmail-exploit %}) or spent any time developing shellcode, you’ve run into bad characters. Null bytes that kill your TCP connection. Characters that get mangled by string functions before they ever reach your buffer. Values that simply don’t survive the journey from your machine to the target.

Encoders are how you get around that.

The idea is straightforward: encode your shellcode before sending it, and prepend a small decoder stub that runs first on the target and decodes it back before handing off execution. The decoder runs, your shellcode is restored, and the bad characters never had to exist in the payload.

The simplest encoder uses XOR — and that’s what this post builds from scratch.

Why XOR?

XOR has a useful property: if you XOR the same data with the same key twice, you get back to where you started. That means your encoder and your decoder are the same logic. One implementation, two purposes. For something that needs to be compact enough to fit in limited shellcode space, that matters.

Start with pseudocode

Before writing a single byte of assembly, plan it out in plain English:

clear the register
save the current address
get the length of the shellcode
xor the current byte of shellcode
increment the address
check if we've reached the end
jump back to the xor if not

Seven steps. That’s the whole encoder. Now let’s turn each one into assembly.

Step 1: Clear a register

ECX will be used to track position and length. Start by zeroing it — XORing a register against itself always produces zero:

1
XOR ECX, ECX

Step 2: Get the current address (position-independent)

This is the clever bit. The encoder needs to know where it is in memory at runtime — but that address changes depending on where the shellcode lands. The solution is a self-referencing CALL:

1
2
3
CALL FFFFFFF    ; pushes current address (EIP) onto the stack, jumps +6
INC ECX         ; increment ECX (more on why below)
POP EAX         ; store the address in EAX

CALL pushes the address of the next instruction onto the stack before jumping. POP EAX retrieves it. EAX now contains the current memory address — dynamically, regardless of where the shellcode loaded. This is the foundation of position-independent shellcode.

Step 3: Store the shellcode length

Two versions here, depending on shellcode size.

For shorter shellcode, 8 bits of ECX (the CL register) is enough:

1
MOV CL, 0FF

For larger shellcode, use 16 bits (the CX register):

1
MOV CX, 0FF

Because ECX was zeroed in step 1, loading into CL or CX leaves the upper bits clean. No masking needed.

Step 4: Calculate the end address

ECX holds the shellcode length. EAX holds the current address. Adding them together gives you the address where the shellcode ends — which is what the loop needs to know when to stop:

1
ADD ECX, EAX

Step 5: XOR the current byte

This is the encoding instruction itself. Rather than XORing a register, the encoder operates on the memory the register points to — because the shellcode lives in memory, not in a register:

1
XOR BYTE PTR DS:[EAX+F], 0F

This XORs the byte at memory address EAX+16 with a seed value of 0F. The +F offset accounts for the length of the encoder stub itself — so the encoder doesn’t accidentally XOR its own instructions before they’ve run.

Step 6: Increment the counter

Move EAX forward one byte:

1
INC EAX

Step 7: Check if done, loop if not

Compare the current position against the end address calculated in step 4:

1
2
CMP EAX, ECX
JNZ SHORT F7

If EAX equals ECX, the loop ends — all bytes have been encoded. If not, JNZ jumps back to the XOR instruction and the process continues.

The complete encoder

Assembled and annotated:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
00401000 >  33C9        XOR ECX,ECX              ; Zero ECX
00401002    E8 FFFFFF   CALL 00401006             ; Push EIP onto stack
00401006    FFC1        INC ECX                   ; Increment ECX
00401008    58          POP EAX                   ; Store EIP in EAX
00401009    B1 FF       MOV CL,0FF                ; Load shellcode length into CL
0040100B    03C8        ADD ECX,EAX               ; Calculate end address
0040100D    8070 0F 0F  XOR BYTE PTR DS:[EAX+F],0F ; XOR byte with seed
00401011    40          INC EAX                   ; Advance to next byte
00401012    3BC1        CMP EAX,ECX               ; Check if done
00401014   ^75 F7       JNZ SHORT 0040100D        ; Loop if not

Ten instructions. Compact by design.

How this plugs into nullsploit

In the example above, the shellcode length and XOR seed are static values — useful for understanding the logic, but not practical for a real tool. In the [nullsploit exploitation engine]({% post_url 2019-05-03-nullsploit-engine %}), both values are generated at runtime. The length is calculated automatically from the payload being encoded, and the seed is dynamic — which means each generated payload looks different, even for the same shellcode. That’s the version worth using in practice.

What to take away from this

A few things worth internalising:

Position-independent code matters. Any shellcode that hardcodes memory addresses will break when it lands somewhere unexpected. The CALL/POP EAX pattern for getting the current address at runtime is something you’ll see again and again in real-world shellcode.

Encoders are small by necessity. Every byte of encoder stub is a byte that isn’t shellcode. The constraint forces you to think carefully about efficiency in a way that most programming doesn’t.

And understanding how an encoder works at the assembly level means you understand what Metasploit’s shikata_ga_nai is actually doing when you select it — rather than just knowing that it handles bad characters. Know your tools from the inside out. That’s the standard worth holding yourself to.