Demystifying x86 Assembly: An Introduction and How-To

this takes time to learn - don't be discouraged

·

16 min read

I've been slowly learning assembly over the years by being exposed to it in my role as a Threat Analyst, but I would still consider myself to be a beginner. Lately, I've been taking action and learning more about it so that I can feel comfortable in a debugger and disassembler.

How I Started Learning

We should start from the beginning - I don't have a background in Computer Science. I went to college for Network Engineering because I thought switches, routers, and networking were really interesting. After graduating and starting on my Cisco certifications, I realized that it wasn't the career path for me. I have nightmares of Packet Tracer to this day. I worked as a System Admin for years, and then made the switch into cyber.

Naturally, working as a System Admin and having a background in networking does set me up with a solid and required foundation but it does nothing for understanding assembly, memory management, the Windows API, etc - skills that you often need when it comes to malware analysis and RE.

My advice here would be: don't be discouraged. If you had a degree/background in Computer Science, you would have the upper hand when it comes to learning assembly but on the flip side you'd have a lack of understanding in relation to networking, sysadmin, and perhaps other skillsets that are nice to have for malware analysis. We all have our struggles.

Optional Reading (but really recommend)

The following books are helpful:

  • Code: The Hidden Language of Computer Hardware and Software - This book does an excellent job of explaining how computers work and will familiarize you with number systems, gates, switches, etc.

  • Practical Malware Analysis: A Hands-On Guide to Dissecting Malicious Software - An excellent resource for malware analysis!

That's it. Let's keep it approachable and doable instead of daunting. You may be wondering why I don't have an x86 assembly book listed here. When it comes to learning something dry or complicated like this, I personally don't learn from reading alone. I need to do in order to learn, and instead, I would recommend this video series: Learning x86 with NASM

If you must have a book, then "Assembly Language Step by Step" is supposedly good, as well as "Assembly Language for x86 Processors" by Kip Irvine. I would advise you use them to supplement your learning and not as your only point of learning. Unless you like to read this sort of stuff, then by all means go for it. I will be honest, I’ve read neither because as I’ve explained I don’t learn that way. I do plan to read them now, and I think I’ll pick up a lot more than if I read them prior to having more understanding of the subject. Up to you!

x86 History (a quick overview)

There was once a time when this was all pioneered and it was much more difficult than the layers of abstraction we work with today. We should consider ourselves lucky. Side note: Are you also shocked when you stumble across man pages that are dated from the 80s and early 90s? (seriously, how?)

Anyway, assembly is a low level programming/machine language which can directly communicate with the processor. When we think of a language like Python, we are very abstracted from this low level, meaning there are a lot of layers between our Python code and what is happening at the processor level. With assembly, we are much closer - there is a lot less abstraction. This is what makes it difficult.

Keep in mind that assembly is specific to the hardware architecture, and in this blog we're talking about Intel x86, not ARM, MIPS, or anything else. Note: AMD is x86/x86-64 too.

The reason why it's called "x86" is because back in 1978 when Intel came out with their 16-bit 8086 microprocessor, there were successors named similarly: 80186, 80286, 80386, 80486, and it became easier to just write x86. You may also run into different syntax when it comes to x86, whether it be Intel or AT&T syntax. We'll be focusing on Intel.

Intel     vs      AT&T
------            ------ 
mov eax, 0xb      mov $0xb,%eax

Registers

Grab a coffee! x86 is a 32-bit processor architecture, meaning it refers to any 32-bit processor that is compatible with the x86 instruction set. This also means that the registers are 32-bits in size (4 bytes). 00000000 00000000 00000000 00000000

Memory Hierarchy Triangle - We Are Here!

You may have seen registers before like eax, ebx, ecx, edx, and so on. Processor registers are the main "tool" used to write programs in assembly, think of them as variables since they are used to store and manipulate data. There are 8 general purpose registers in x86:

NameDescription
EAXAccumulator, automatically used by multiplication and division
EBXGeneral purpose
ECXLoop counter used by the CPU
EDXGeneral purpose
ESIGeneral purpose. High speed memory transfer
EDIGeneral purpose. High speed memory transfer
EBPBase pointer
ESPStack pointer

Note: There are also 64 bit registers which are denoted with an "r", such as rax, rbx, rcx, rdx. That's not the focus of this blog since we want to start off easier with 32 bit instead of 64, but keep in mind if you see such registers, it just denotes a 64-bit architecture.

Special Purpose Registers

  • EIP: The instruction pointer. Holds the memory address of the next instruction to be executed. Helps keep track of the current execution point in a program, constantly changing

  • EFLAGS: This register holds the state of the processor which can contain various status flags:

AcronymMeaning
CFCarry Flag
OFOverflow Flag
SFSign Flag
ZFZero Flag
ACAuxiliary Carry (or AF)
PFParity Flag

High Bits and Low Bits of Registers

32-bit registers can be split in order to utilize the high or low bits of the register. For example:

  • EAX - 32 bits

  • AX - 16 bits

  • AH - 8 bits

  • AL - 8 bits

In this case, AH means "high bits" and AL means "low bits" of the AX register. The AX register is the lower 16 bits of the EAX register. In fact, EAX just means Extended AX. This same operation can be applied to other general purpose registers as well.

💡
Keep in mind if you see AX,BX, CX, DX and subsequently AH, AL, BH, BL, CH, CL, DH, DL that this refers to splitting a register and using certain bits of it instead of the entire 32 bit register.

Guide to x86 Assembly

Common x86 Instructions (push, pop, mov, etc)

These are the instructions that the language uses in order to determine what to do with data. The instructions will handle data movement, arithmetic/logic/math, and control flow of the program. Every single time you look at a binary in a disassembler or debugger, you will be reading x86 instructions.

There are a lot of instructions in the x86 instruction set, but there are only a handful that you will typically run into. In fact, according to these statistics the mov instruction accounts for 35% of all instructions followed by push (10%) and call (6%)!

credit: Peter Kankowski (strchr.com), https://www.strchr.com/x86_machine_code_statistics

Data Movement:

Logic and Arithmetic:

Control Flow:

Important Concepts

Assuming you already understand number systems such as binary, hexadecimal, and decimal (of course you know this!), then there are a couple more key concepts that I personally find extremely helpful in understanding the why of certain things. If you need a refresher on binary, hexadecimal, and the different ways we can represent data, I advise you take some time to review that.

Key Concept: Endianness

Endianness refers to the order in which computer memory stores a sequence of bytes.

💡
A quick and valuable tip from OST (OpenSecurityTraining): Endianness applies to memory not registers, and bytes not bits!

There are two types of endianness you'll run into, either big-endian (BE) or little-endian (LE).

Little Endian (LE)

Intel is based on Little Endian. This means that data is stored in RAM "little end" first. The LSB (least significant bit, aka right-most) of a word (data type) or larger is stored in the lowest address.

Big Endian (BE)

Stored in RAM "big end" first. The MSB (most significant bit, aka left-most) of a word or larger is stored in the lowest address. ARM is bi-endian (confusing, yes...not our problem here.)

Here's another example, but I edited the picture with the arrow showing the direction the address would grow, as well as fictitious memory addresses to get the point across.

To wrap this up without getting too in depth, I can show this in a debugger. Assume I take the assembly below and toss it into a debugger to step through the execution:

section .data ; example assembly storing SNOW in eax register
        char DB 'SNOW',0
section .text
global _start

_start:
        MOV eax,[char]
        INT 80h

Now, in the debugger (gdb), what you'll see is that I've set a breakpoint on _start, executed the program, and stepped through execution until the interrupt (80h). Then, I display the contents of the memory address 0x804a000. Do you see 0x574f4e53? That is the hexadecimal representation of "WONS", which is our string in little-endian AKA little-end first. To further visualize, imagine the following, once again using fictitious memory addresses as a means of example:

'W'    'O'    'N'    'S'
0x1    0x2    0x3    0x4

Take some time to read up on this if you need to, I confused myself several times just writing this. Knowing the concept or the idea of the concept is enough for most people to notice when things stand out during analysis. Otherwise, you'd be scratching you're head at the order of bytes in memory.

Memory (Stack, Heap, etc)

Ah, the stack. I’ll be honest, of all the confusing areas of knowledge required to understand assembly, this confused me the most… especially not having a computer science background. How often does one think of memory management if they are not a programmer?

I was primarily confused because depending on who you talk to, they will explain it a different way than the last person…and I’ll probably explain it different than them. The way I finally started to understand what was going on was by watching the stack in x86dbg while dissecting basic C programs I wrote.

The stack is a region of memory that works like a stack of plates. The stack is fast and is a fixed size. Data is stored and retrieved in a Last In First Out (LIFO) order, and grows downward (high to low addresses). Primarily used for:

  • Local variables

  • Function parameters

  • Return addresses

  • Short-term data storage

  • Automatically cleaned up

The heap is a larger region of memory used for dynamic allocation. The heap is slower, is flexible in size, and grows upward (low to high addresses). Data can be stored and retrieved in any order, and is primarily used for:

  • Dynamic memory allocation

  • Large data structures

  • Long-term data storage

  • Data that needs to persist beyond function calls

  • Manually allocated and freed

We’re keeping this vague as it’s very easy to dive down a rabbit hole here and that’s not the point of this post. Here are some references though:

Finally: Dissecting a Hello World Program

Let’s create a classic “Hello World” program in C, and just for additional flair we’ll add a basic arithmetic function too. The program makes no sense from a functional standpoint, but it will show how we store data in registers, call functions, set up and clean the stack, etc.

Here is our basic C program. We’re defining a function called calculateTotal which takes 2 parameters, then adds them together. Then, we are calling the function with the values of 5 and 10, printing “Hello World”, and printing the result of the integer addition, which should be 15.

#include <stdio.h>

int calculateTotal(int a, int b) {
  return a + b;
}

int main() {
  int total = calculateTotal(5, 10);
  printf("Hello World!\n");
  printf("The result is: %d\n", total);
}

Let’s take a look at how this looks in our analysis VM. I think you’d be surprised to see the size of the file and the amount of imports, strings, and other indicators given what the program actually does. This is due to the compiler, runtime, and various variables during compilation, but this is a perfect example of why we need to understand how to reverse binaries at an assembly level.

On paper, this binary could look malicious. It’s 107kb in size, is novel in our environment, and if you look at the imported functions, it apparently has capabilities to write data to files, has anti-debug capability, and what can be seen as memory manipulation imports as well. Some may even write it off as malicious after a quick peek. This could of course lead to the improper training of machine learning models by classifying certain clean software as malicious, affect automated systems, lead an analyst down a rabbit hole etc.

Let’s toss it in IDA, where we can actually see what the program’s intent is. As we know, it’s to add two values and print “Hello World!” Note, this is not a blog on how to use IDA, and I’d be the worst teacher for that as I’m still learning all the tricks and secrets too.

Here is the main function. Let’s focus on the address range 0x401010 to 0x401046.

  1. The stack is set up between 0x401011 and 0x401013, also known as the “function prologue”:
push ebp    <- push ebp onto the stack
mov ebp, esp    <- move the stack pointer (esp) into ebp so we have a stable reference for the current stack frame
push ecx    <- push ecx onto the stack
  1. At 0x401014 and 0x401016, both of our arguments which we are feeding to the function are now pushed onto the stack as hexadecimal “A” (10), and 5. You can convert “0Ah” into decimal by clicking it and hitting “h” in IDA.
push OAh    <- push 10 (hex A) onto the stack 
push 5    <- push 5 onto the stack
  1. Now, we call the calculateTotal function from our source code, which is represented as “call sub_401000” in IDA, at 0x401018. We can rename that to calculateTotal. Double clicking on “sub_401000”, now calculateTotal, takes us here, where the arithmetic happens:

  2. In this function, we add the two values together as we expect. Since this is a new function call, the stack frame must be set up again. (function prologue). Then, at 0x401003 and 0x401006, the addition is completed.

mov eax, [ebp+arg_0]    <- move the value stored at ebp+arg_0 into eax
add eax, [ebp+arg_4]    < - add the value stored at ebp+arg_4 to eax

5 will be added to 10, and the hexadecimal value "0000000F" (15 decimal) will be 
stored in the eax register.
Note that square brackets i.e. [ebp+arg_0] in IDA mean that a memory address is being referenced. In IDA, you can see that “arg_0” is equal to 8, this is just IDA trying to be helpful. If you were to toss this into x86dbg, you can right click this and follow the selected memory address in the debugger. So it would look like [ebp+8]. Imagine that the value of the address in ebp is 00EFFB68 and you added 8 to it. That would be 00EEB70, and in that memory address would reside the value “5”. (one of our function arguments)
  1. We’re now returning the main function. In the calculateTotal function, at 0x401009 we pop ebp, and then return. When we pop ebp, we restore the previously saved value for ebp by popping it off the stack. Afterwards, the next value on the stack will be the saved return address to the previous function.

    That address is saved by the call instruction when we first entered this calculateTotal function, and it performs two operations:

    1. Pushes the return address onto the stack

    2. Jumps to the call target

Likewise, the retn instruction performs two operations:

  1. Pop the return address off the stack

  2. Jump to the return address

…That’s it for the calculateTotal function! Now let’s take a look at the printing:

  1. Starting at 0x40101D, we are back in the main function. 8 is being added to esp in order to clean up the stack. In cdecl (the calling convention of C on x86_32 machines on Windows), the caller cleans up the stack. An int is 4 bytes and both parameters pushed totaled 8 bytes. That’s why 8 is being added to esp to “clean up the stack”. Calling conventions can change depending on the language, operating system, or if a developer calls for the binary to use a different calling convention.

  2. At 0x401023 we push the string “Hello World!” onto the stack, and then at 0x401028 we call the print function. If we wanted to, we could rename sub_401090 to printf.

add esp, 8    <- clean up the stack
mov [ebp+var_4], eax    <- move value of eax into [ebp+var_4], the value is 0000000F (15)
push offset aHelloWorld    <- push pointer to Hello World onto stack
  1. Next, the same thing with the calling convention occurs here at 0x40102D where 4 is added to esp to clean up the stack.

    1. You may be thinking, “Hello World!” is more than 4 bytes long, so why are we only adding 4 to esp? It’s because we are referencing a pointer to the data section, and the size of the pointer is 4 bytes. So we may be referencing a memory address such as 0035B000 for example that can contain the following:

    2.    MEM ADDRESS                |    VALUE    |                    | STRING
         0035B000    48 65 6C 6C 6F 20 57 6F 72 6C 64 21 0A 00 00 00  Hello World!....
      
  2. When we add 4 back to esp in the previous step, we are now reunited with our value of “0000000F” or “15”, which is the sum of the calculateTotal function. Can you kind of see how we bounce around in the program, and how esp is keeping us on track?

  3. Now, at 0x401030 the value stored at [ebp+var_4] (which is [ebp-4]) is moved into eax. Again, this is due to IDA trying to be helpful. The value stored at [ebp-4] is…. 15 (our sum). Then, eax is pushed onto the stack.

  4. We’re now ready to execute this part of our source code: printf("The result is: %d\n", total);. Since eax is now pushed onto the stack, we can push a pointer to the string “The result is…” onto the stack as well, and then call the print function. (0x401034 and 0x401039). The program would then run and give us the output we expect: Hello World! The result is: 15

  5. And FINALLY between 0x401041 - 0x401046, we’re in the function epilogue. We clean up the stack again, this time we add 8 because there are two parameters, zero out eax via xor to set the return value to 0, pop ebp, and return control back to the C runtime environment.

I hope this was helpful! I know that there are lots of confusing bits. If there are any mistakes, please let me know. For further learning I heavily recommend creating basic binaries, stepping through the execution in a debugger and matching it 1:1 to what you’re looking at and analyzing in IDA (or Ghidra, BinaryNinja, etc). Even if it’s for simple binaries like this, it really helps piece it together. At least it did for me. Good luck and happy hunting!