Table of contents
Use-Cases
There are a few use cases when assembly language I/O is useful:
Low-level system programming: When writing drivers, kernels, or other low-level system software, assembly language I/O gives you the most direct access to hardware. This maximizes performance and minimizes overhead.
Real-time applications: For applications that require precise timing and low latency, assembly language I/O can give more deterministic performance than higher level languages. This is useful for real-time applications like robotics, industrial control, etc.
Embedded systems: For programming microcontrollers and other embedded systems, assembly language is often the only option. Higher-level languages may not be supported, so assembly I/O is necessary.
Optimizing performance: Even for applications written in higher-level languages, performance-critical sections can be optimized using inline assembly. This includes I/O operations that need to be as fast as possible.
Debugging: Assembly language I/O can be useful for debugging issues in higher-level code. You can insert inline assembly to log debug information directly, without the overhead of library calls.
Learning: Assembly language I/O gives you a deeper understanding of how I/O operations are performed at a low level. This knowledge helps in learning operating system concepts, hardware interfacing, and more.
So in summary, the main use cases are for:
Low-level system software
Real-time applications
Embedded systems programming
Optimizing performance-critical sections
Debugging
Learning and understanding I/O at a low level
How is done?
IO in Assembly language is done using system calls. System calls are functions provided by the operating system that allow a program to interact with hardware devices.
Some common IO system calls in Assembly are:
read(): Used to read data from an input device like a keyboard, file etc. The data is read and stored in a memory location specified by the program.
write(): Used to write data to an output device like screen, file etc. The data to be written is obtained from a memory location specified by the program.
open(): Opens a file and returns a file descriptor which is then used for read and write operations on that file.
close(): Closes an already opened file.
These system calls are invoked using interrupt instructions in Assembly. For example, in x86 Assembly:
int 0x80 - Invokes a Linux system call
int 0x21 - Invokes a DOS system call
The system call number and its parameters are passed in CPU registers before invoking the interrupt.
int 0x80
The int 0x80 instruction is used to make system calls in Linux. When this instruction is executed, the following happens:
The 0x80 value is placed in the eax register, which specifies the system call number. Different values in eax correspond to different system calls.
The arguments for the system call are placed in the ebx, ecx, edx, esi, edi, and ebp registers. The number and meaning of the arguments depend on the specific system call.
The int 0x80 instruction is executed. This triggers a software interrupt, which causes the CPU to switch to kernel mode.
The Linux kernel examines the eax register to determine which system call is being made. It then looks at the other registers to get the system call arguments.
The kernel executes the appropriate system call, performing actions like reading/writing files, allocating memory, creating processes, etc.
The result of the system call (if any) is returned in one of the registers, typically eax.
Control returns to the user mode program, and execution continues from the instruction following int 0x80. The program can then check eax to determine the result of the system call.
So in short, the int 0x80 instruction triggers a software interrupt, causing control to switch to the Linux kernel so it can execute the requested system call. The arguments are passed in registers, and the result is returned in a register.
The key things to remember are setting eax to the system call number, putting arguments in the other registers, and checking eax after the int 0x80 instruction to get the result of the system call.
EAX Value
Here is a table of some common eax values and their corresponding system calls for the int 0x80 instruction:
eax Value | System Call | Description |
1 | exit() | Terminate the process and return a status code to the parent. |
4 | write() | Write data to a file descriptor. |
5 | open() | Open a file and return a file descriptor. |
53 | lseek() | Set the file position indicator for a file descriptor. |
60 | exit_group() | Terminate all processes in the process group. |
89 | clone() | Create a child process. |
102 | fstat() | Get file status. |
104 | writev() | Write data to a file descriptor from multiple buffers. |
231 | exit_group() | Terminate all processes in the process group. |
The exact eax values may vary slightly depending on the Linux distribution, but the general purpose of each system call remains the same.
The key points to note are:
Different eax values correspond to different system calls
System calls perform low-level tasks like opening files, writing data, creating processes, etc.
Arguments are passed in ebx, ecx, edx, etc.
The return value is placed in eax.
Hope this table helps clarify the different eax values and system calls for the int 0x80 instruction. Each system call is different. To learn all the arguments you must study several use-cases.
Read()
Here is an example of using read() system call in x86 Assembly language for Linux:
global _start
_start:
; Read 10 bytes from stdin and store in buffer
mov eax, 3 ; read system call
mov ebx, 0 ; Read from stdin (file descriptor 0)
mov ecx, buffer ; Address of buffer where data will be stored
mov edx, 10 ; Read 10 bytes
int 0x80 ; Invoke read system call
; Exit the program
mov eax, 1 ; exit system call code
mov ebx, 0 ; Return 0
int 0x80 ; Invoke exit system call
section .data
buffer:
resb 10 ; Reserve 10 bytes for buffer
Let's break this down:
We define the _start label which is the entry point for the program.
We move the system call number 3 (for read()) into eax.
We move the file descriptor 0 (for stdin) into ebx.
The address of our buffer is moved into ecx.
We want to read 10 bytes, so we move 10 into edx.
We invoke the read system call using int 0x80. This will read 10 bytes from stdin and store it in our buffer.
Then we invoke the exit system call to exit the program, returning 0 as the exit status.
We define a 10 byte buffer in the .data section to store the input.
You can compile and run this Assembly program using:
nasm -f elf64 read.asm && ld -o read read.o && ./read
Enter some input and press Enter. The program will exit after reading 10 bytes from stdin.
Notes about the buffer label
step-by-step explanation of the buffer example:
- We define a 10 byte buffer space within the .data section:
.data
buffer:
resb 10
This allocates 10 consecutive bytes of memory, and labels that memory with the name "buffer".
- We then use the buffer label within the _start section to refer to that memory location:
.text
_start:
mov ecx, buffer
The mov ecx, buffer
instruction moves the starting address of the 10 byte buffer space into the ECX register, so we can access or modify the buffer's contents.
- The key things to note are:
We define the buffer space first using
resb 10
. This comes first in the order of directives.We then use the
buffer
label in an instruction to refer to that space. This comes second.Even though the .data section comes after the _start section in the code, the order of the directives within each section is what matters.
So in summary, defining the buffer space first using resb 10
, and then using the buffer
label in an instruction allows the label to correctly refer to the allocated 10 byte buffer.
write()
The write() system call in assembly is used to write data to a file descriptor. It has the following format:
mov eax,4 // System call number 4 is write()
mov ebx,fd // File descriptor to write to
mov ecx,buffer // Address of buffer containing data to write
mov edx,n // Number of bytes to write
int 0x80 // Call kernel
This does the following:
It loads the system call number 4 into eax, which corresponds to write().
It loads the file descriptor to write to into ebx. This could be 1 for stdout, 2 for stderr, or a file descriptor returned from a previous open() call.
It loads the address of the buffer containing the data to write into ecx.
It loads the number of bytes to write from that buffer into edx.
It executes the int 0x80 instruction to make the system call. This switches to kernel mode and executes the write() system call.
The kernel then writes the specified number of bytes (edx) from the specified buffer (ecx) to the file descriptor (ebx).
The return value is placed in eax, indicating the number of bytes actually written. This can be less than the requested number of bytes.
Control then returns to the user mode program.
So in summary, the write() system call in assembly allows a program to write data from a buffer to a file descriptor. The key registers used are eax, ebx, ecx and edx as described above.
Hello World
Next program will print "Hello, World!" to the console and exit. Here is how you can write "Hello World!" to the console in x86 Linux assembly language:
section .text
global _start
_start:
mov edx,len
mov ecx,msg
mov ebx,1 ; stdout
mov eax,4 ; syscall number for write
int 0x80 ; call kernel
mov eax,1 ; syscall number for exit
int 0x80 ; call kernel
section .data
msg:
db "Hello, World!",0xa ; string to output
len:
equ $ - msg ; length of the string
Breaking it down:
We define a
.text
section containing the executable codeThe
_start
label defines the entry pointWe move the string length into edx
We move the address of the string into ecx
We move 1 into ebx to indicate stdout as the file descriptor
We move 4 into eax to indicate the write() system call number
We execute
int 0x80
to make the kernel system call, writing the string to stdoutWe then move 1 into eax to indicate the exit() system call number
Again, we execute
int 0x80
to exit the programWe define a
.data
section containing the string "Hello, World!\n"We use the
equ
directive to define thelen
label as the length of the string
Let's study the sections:
The .data section in this assembly language example contains two things:
- The string we want to print - "Hello, World!\n"
This is defined as:
msg: db "Hello, World!",0xa
The db
directive tells the assembler to allocate space for a series of bytes. In this case, it allocates space for the ASCII characters that make up the string, plus a newline character \n
(0xa).
- The length of the string
This is defined as:
len: equ $ - msg
The equ
directive defines a symbol (len) and equates it with an expression ($ - msg).
This expression calculates the difference between the current location ($) and the start of the msg string. Since the .data section immediately follows the msg string, this effectively calculates the length of the string.
So in summary, the .data section contains:
The actual string data we want to print
The length of that string
These are then referenced in the .text section code to actually perform the write() system call and print the string.
The key points are:
.data contains initialized data
.text contains executable code
We define symbols (msg and len) to refer to the string and length
We use directives (db and equ) to allocate space and define symbols.
Note: At this time, I do not expect you to understand it all. There are things not yet explained fully. However, you can see in this example the usage of write() using the console as a target.
files
In Linux assembly language, open() and close() are system calls used to open and close files.
The open() system call has the following format:
mov eax, 5 ; system call number 5 for open()
mov ebx, filename ; pointer to filename
mov ecx, flags ; open flags (O_RDONLY, O_WRONLY, O_CREAT)
mov edx, mode ; file permissions if creating
int 0x80 ; make kernel system call
This does the following:
It loads the system call number 5 into eax, which corresponds to open().
It loads the pointer to the filename (string) into ebx.
It loads the open flags into ecx. This determines if the file is opened for reading, writing, or creating if it doesn't exist.
It loads the file permissions into edx. This is only used if creating a new file.
It executes the int 0x80 instruction to make the system call. The kernel then opens the file and returns a file descriptor.
The returned file descriptor is placed in eax and can then be used for read(), write(), lseek(), and close() operations on that file.
The close() system call has a similar format:
mov eax, 6 ; system call number 6 for close()
mov ebx, fd ; file descriptor to close
int 0x80 ; make kernel system call
This simply closes the file referenced by the file descriptor in ebx.
So in summary, open() is used to open a file and get a file descriptor, which is then used for read/write operations. close() is used to close the file and release its resources when done.
Disclaim: This article is written with HashNode Rix AI. I study Assembly step by step. If you find errors or wish to know more, simple ask Rix yourself. There is a lot until I fully understand these things myself.