ASM Introduction

Aseembly!

Assembly is a low-level programming language that provides a human-readable representation of machine code instructions. When reverse engineering malware, malcicious programs can be converted from binary machine code to assembly code; a process called as ‘disassembly’.

x86 assembly refers to 32-bit architecture and x86_64 (x64) is for 64-bit acrhitectures.

What are CPU Registers ?

As a program is running, CPUs uses registers, which are memory locations on the physical processor chip, to store data and keep track of the processing state. Because memory storage is much slower, the CPU takes advantage of registers as much as possible for data storage and manipulation. Depending on processor Architecture, each register can store a certain amount of data. A word is equal to 16 bits of data. An x86 processor can store one dword (double-word) or 32 bits of data, while an x64 processor register can store one qword (quad-word) or 64 bits of data.

There are five primary types of CPU registers:

  1. General Registers.
  2. Index and Pointer Registers.
  3. Flag Registers.
  4. Segment Registers.
  5. Indicator Registers

General Registers

These registers are used to store and process data for general purposes such as arithmetic operations and function arguments. each general register can be split into smaller segments containing 16 or 8 bits of data. For example, the x64 RAX register, which can store 64 bits of data, contains four additional smaller general registers:

  • EAX (The last 32 bits of data in RAX)
  • AX (The upper 16 bits of EAX)
  • AH (The upper 8 bits of EAX)
  • AL (The lower 8 bits of EAX)

The following table describes each general register for x86 and x64 processors. These descriptions reflect how each register has been used historically; this doesn’t mean that the register must be used in this way.

x86 Register x64 Register Description
EAX RAX The accumulation register, used for tasks such as arithmetic, interrupts, and storing return values.
AX AX Upper 16 bits of EAX
AH AH Upper 8 bits of EAX
AL AL Lower 8 bits of EAX
EBX RBX Used for referencing variables and arguments.
BX BX Upper 16 bits of EBX
BH BH Upper 8 bits of EBX
BL BL Lower 8 bits of EBX
ECX RCX The counter register, used for counting and loop control.
CX CX Upper 16 bits of ECX
CH CH Upper 8 bits of ECX
CL CL Lower 8 bits of ECX
EDX RDX The data register, used primarily for arithmetic operations and sometimes as a backup for EAX.
DX DX Upper 16 bits of EDX
DH DH Upper 8 bits of EDX
DL DL Lower 8 bits of EDX

Index and Pointer Registers

These registers can store both pointers and addresses. They can be used for tasks such as transferring memory data, maintaining control flow, and keeping track of the stack.

x86 Register x64 Register Description
ESI RSI The source index; typically serves as the source address in memory operations.
EDI RDI The destination index; typically serves as the destination addreess in memory operations.
EBP RBP The base pointer; points to the base of the stack.
ESP RSP The stack pointer; points to the last item pushed to the stack.
EIP RIP The extended instruction pointer; points to the address of the code that will be executed next.

Note: ESI, EDI, EBP and ESP can be broken down into it’s 16-bit segments. ESI would have SI as it’s 16-bit segment, EDI would have DI, EBP woulda BP and ESP would have SP.

The Flags Register

This register keeps track of the current state of the processor. Generally, it’s used for storing the results of computations and controlling the processor’s operation. Flags is general term for ‘EFLAGS’ register, which is used in 32-bit architectures, while the ‘RFLAGS’ are used in 64-bit architectures.

The two most important flag values for our purposes are the zero flag (ZF) and trap flag (TF). The ZF is a single bit in length and is set with a conditional intruction. For example, a conditional intruction may compare two values; if the values are the same, the ZF will be set to 1. The TF is used for debugging purposes and allows the debugger to single-step through instructions.

Segment Registers

These registers are used specifically for referencing memory locations. There are three different methods of accessing system memory of which we will focus on the flat memory model which is relevant for malware analysis.

There are Six Segment Registers which are as follows:

  • CS (Code Segment) Register - Stores the base location of the code section (.text) which is used for data access.
  • DS (Data Segment) Register - Stores the default location for variables (.data) which is used for data access.
  • ES (Extra Segment) Register - Used during string operations.
  • SS (Stack Segment) Register - Stores the base location of the stack segment and is used when implicitly using the stack pointer or when explicitly using the base pointer.
  • FS (Extra Segment Register)
  • GS (Extra Segment Register)

Each segment register is 16-bits and contains the pointer to the start of the memory-specific segment. The CS register contains the pointer to the code segment in memory. The code segment is where the instruction codes are stored in memory. The processor retrieves instruction codes from memory based on the CS register value and an offset value contained in the EIP register. No program can explicitly load or change the CS register. The processor assign its values as the program is assigned a memory space.

The DS, ES, FS and GS segment registers are all used to point to data segments. Each of the four separate data segments help the program separate data elements to ensure that they do not overlap. The program loads the data segment registers with the appropriate pointer value for the segments and then reference individual memory locations using an offset value.

The Stack Segment Register (SS) is used to point to the stack segment. The stack contains data values passed to functions and procedures within the program.

Segment registers are considererd part of the OS and can neither read nor be changed directly in almost all cases. When working in the protected mode flat model, our program runs and receives a 4GB address space to which any 32-bit register can potentially address any of the four billion memoryu locations except for those protected areas defined by the operating system. Physical memory can be larger than 4GB however, a 32-bit register can only express 4294967295 different locations. If we have more than 4GB of memory, the OS must arrange a 4GB region within memory and your programs are limited to that new region. This task is completed by the segment registers and the OS keeps close control of this.

Control Registers

There are five control registers which are used to determine the operation mode of the CPU and the characteristics of the current executing task. Each control register is as follows:

  • CR0 - System flag that control the operating mode and various states of the processor.
  • CR1 - (Not currently implemented)
  • CR2 - Memory page fault information.
  • CR3 - Memory page directory information.
  • CR4 - Flags that enable processor feathers and indicate feature capabilities of the processor.

The values in eac of the control registers can’t be directly accessed however the data in the control registers can be moved to one of the general-purpose registers and once the data is there, a program can examine the bit flags in the register to determine the operating status of the processor in conjunction with the current running task.

If a change is required to a control register flag value, the change can be made to the data in the general purpose register and the register moved to the CR. Low-level System Programmers usually modify the values in control registers. Normal application programs do not usually control registers entries however they might query flag values to determine the capabilities of the host processor chip on which the program is currently running.

Why Learn Assembly ?

Most malware is written ina middle-level language and once compiled it can be read by the hardware or OS as it is not human-readable. In order for professional Cyber Security Engineers to understand, we must learn to read, write and properly debug Assembly.

Assembly language is low.-level and has many more instruction thatn we would see in a higher level application, however, what is powerful about assembly is that it contains the absolute truth about what is going on in the binary. In the assembly level nothing is hidden from us. Understanding Assembly language allow us to open a debugger on a running process or any bit of malware and we can see exactly what is going on and then grab the EIP instruction pointer to go where we need it to go to have complete control over the program flow.