ASM IO Operations

ASM IO Operations

Explain input/output operations in Assembly.

Use-Cases

There are a few use cases when assembly language I/O is useful:

  1. Low-level system programming: When writing drivers, kernels, or other low-level system software, assembly language I/O gives you the most direct access to hardware. This maximizes performance and minimizes overhead.

  2. Real-time applications: For applications that require precise timing and low latency, assembly language I/O can give more deterministic performance than higher level languages. This is useful for real-time applications like robotics, industrial control, etc.

  3. Embedded systems: For programming microcontrollers and other embedded systems, assembly language is often the only option. Higher-level languages may not be supported, so assembly I/O is necessary.

  4. Optimizing performance: Even for applications written in higher-level languages, performance-critical sections can be optimized using inline assembly. This includes I/O operations that need to be as fast as possible.

  5. Debugging: Assembly language I/O can be useful for debugging issues in higher-level code. You can insert inline assembly to log debug information directly, without the overhead of library calls.

  6. Learning: Assembly language I/O gives you a deeper understanding of how I/O operations are performed at a low level. This knowledge helps in learning operating system concepts, hardware interfacing, and more.

So in summary, the main use cases are for:

  • Low-level system software

  • Real-time applications

  • Embedded systems programming

  • Optimizing performance-critical sections

  • Debugging

  • Learning and understanding I/O at a low level

How is done?

IO in Assembly language is done using system calls. System calls are functions provided by the operating system that allow a program to interact with hardware devices.

Some common IO system calls in Assembly are:

  • read(): Used to read data from an input device like a keyboard, file etc. The data is read and stored in a memory location specified by the program.

  • write(): Used to write data to an output device like screen, file etc. The data to be written is obtained from a memory location specified by the program.

  • open(): Opens a file and returns a file descriptor which is then used for read and write operations on that file.

  • close(): Closes an already opened file.

These system calls are invoked using interrupt instructions in Assembly. For example, in x86 Assembly:

  • int 0x80 - Invokes a Linux system call

  • int 0x21 - Invokes a DOS system call

The system call number and its parameters are passed in CPU registers before invoking the interrupt.


int 0x80

The int 0x80 instruction is used to make system calls in Linux. When this instruction is executed, the following happens:

  1. The 0x80 value is placed in the eax register, which specifies the system call number. Different values in eax correspond to different system calls.

  2. The arguments for the system call are placed in the ebx, ecx, edx, esi, edi, and ebp registers. The number and meaning of the arguments depend on the specific system call.

  3. The int 0x80 instruction is executed. This triggers a software interrupt, which causes the CPU to switch to kernel mode.

  4. The Linux kernel examines the eax register to determine which system call is being made. It then looks at the other registers to get the system call arguments.

  5. The kernel executes the appropriate system call, performing actions like reading/writing files, allocating memory, creating processes, etc.

  6. The result of the system call (if any) is returned in one of the registers, typically eax.

  7. Control returns to the user mode program, and execution continues from the instruction following int 0x80. The program can then check eax to determine the result of the system call.

So in short, the int 0x80 instruction triggers a software interrupt, causing control to switch to the Linux kernel so it can execute the requested system call. The arguments are passed in registers, and the result is returned in a register.

The key things to remember are setting eax to the system call number, putting arguments in the other registers, and checking eax after the int 0x80 instruction to get the result of the system call.

EAX Value

Here is a table of some common eax values and their corresponding system calls for the int 0x80 instruction:

eax ValueSystem CallDescription
1exit()Terminate the process and return a status code to the parent.
4write()Write data to a file descriptor.
5open()Open a file and return a file descriptor.
53lseek()Set the file position indicator for a file descriptor.
60exit_group()Terminate all processes in the process group.
89clone()Create a child process.
102fstat()Get file status.
104writev()Write data to a file descriptor from multiple buffers.
231exit_group()Terminate all processes in the process group.

The exact eax values may vary slightly depending on the Linux distribution, but the general purpose of each system call remains the same.

The key points to note are:

  • Different eax values correspond to different system calls

  • System calls perform low-level tasks like opening files, writing data, creating processes, etc.

  • Arguments are passed in ebx, ecx, edx, etc.

  • The return value is placed in eax.

Hope this table helps clarify the different eax values and system calls for the int 0x80 instruction. Each system call is different. To learn all the arguments you must study several use-cases.


Read()

Here is an example of using read() system call in x86 Assembly language for Linux:

global _start  

_start:

; Read 10 bytes from stdin and store in buffer

mov eax, 3 ; read system call 
mov ebx, 0 ; Read from stdin (file descriptor 0)
mov ecx, buffer ; Address of buffer where data will be stored
mov edx, 10 ; Read 10 bytes 
int 0x80 ; Invoke read system call

; Exit the program

mov eax, 1 ; exit system call code 
mov ebx, 0 ; Return 0 
int 0x80 ; Invoke exit system call

section .data

buffer:  
    resb 10 ; Reserve 10 bytes for buffer

Let's break this down:

  • We define the _start label which is the entry point for the program.

  • We move the system call number 3 (for read()) into eax.

  • We move the file descriptor 0 (for stdin) into ebx.

  • The address of our buffer is moved into ecx.

  • We want to read 10 bytes, so we move 10 into edx.

  • We invoke the read system call using int 0x80. This will read 10 bytes from stdin and store it in our buffer.

  • Then we invoke the exit system call to exit the program, returning 0 as the exit status.

  • We define a 10 byte buffer in the .data section to store the input.

You can compile and run this Assembly program using:

nasm -f elf64 read.asm && ld -o read read.o && ./read

Enter some input and press Enter. The program will exit after reading 10 bytes from stdin.

Notes about the buffer label

step-by-step explanation of the buffer example:

  1. We define a 10 byte buffer space within the .data section:
.data

buffer: 
   resb 10

This allocates 10 consecutive bytes of memory, and labels that memory with the name "buffer".

  1. We then use the buffer label within the _start section to refer to that memory location:
.text
_start:
mov ecx, buffer

The mov ecx, buffer instruction moves the starting address of the 10 byte buffer space into the ECX register, so we can access or modify the buffer's contents.

  1. The key things to note are:
  • We define the buffer space first using resb 10. This comes first in the order of directives.

  • We then use the buffer label in an instruction to refer to that space. This comes second.

  • Even though the .data section comes after the _start section in the code, the order of the directives within each section is what matters.

So in summary, defining the buffer space first using resb 10, and then using the buffer label in an instruction allows the label to correctly refer to the allocated 10 byte buffer.


write()

The write() system call in assembly is used to write data to a file descriptor. It has the following format:

mov eax,4 // System call number 4 is write()
mov ebx,fd // File descriptor to write to
mov ecx,buffer // Address of buffer containing data to write
mov edx,n // Number of bytes to write
int 0x80 // Call kernel

This does the following:

  1. It loads the system call number 4 into eax, which corresponds to write().

  2. It loads the file descriptor to write to into ebx. This could be 1 for stdout, 2 for stderr, or a file descriptor returned from a previous open() call.

  3. It loads the address of the buffer containing the data to write into ecx.

  4. It loads the number of bytes to write from that buffer into edx.

  5. It executes the int 0x80 instruction to make the system call. This switches to kernel mode and executes the write() system call.

  6. The kernel then writes the specified number of bytes (edx) from the specified buffer (ecx) to the file descriptor (ebx).

  7. The return value is placed in eax, indicating the number of bytes actually written. This can be less than the requested number of bytes.

  8. Control then returns to the user mode program.

So in summary, the write() system call in assembly allows a program to write data from a buffer to a file descriptor. The key registers used are eax, ebx, ecx and edx as described above.


Hello World

Next program will print "Hello, World!" to the console and exit. Here is how you can write "Hello World!" to the console in x86 Linux assembly language:

section .text
    global _start     

_start:                
    mov edx,len 
    mov ecx,msg  
    mov ebx,1      ; stdout 
    mov eax,4      ; syscall number for write    
    int 0x80       ; call kernel

    mov eax,1      ; syscall number for exit      
    int 0x80       ; call kernel

section .data
msg:
    db  "Hello, World!",0xa  ; string to output 
len:   
    equ $ - msg         ; length of the string

Breaking it down:

  • We define a .text section containing the executable code

  • The _start label defines the entry point

  • We move the string length into edx

  • We move the address of the string into ecx

  • We move 1 into ebx to indicate stdout as the file descriptor

  • We move 4 into eax to indicate the write() system call number

  • We execute int 0x80 to make the kernel system call, writing the string to stdout

  • We then move 1 into eax to indicate the exit() system call number

  • Again, we execute int 0x80 to exit the program

  • We define a .data section containing the string "Hello, World!\n"

  • We use the equ directive to define the len label as the length of the string

Let's study the sections:

The .data section in this assembly language example contains two things:

  1. The string we want to print - "Hello, World!\n"

This is defined as:

msg: db "Hello, World!",0xa

The db directive tells the assembler to allocate space for a series of bytes. In this case, it allocates space for the ASCII characters that make up the string, plus a newline character \n (0xa).

  1. The length of the string

This is defined as:

len: equ $ - msg

The equ directive defines a symbol (len) and equates it with an expression ($ - msg).

This expression calculates the difference between the current location ($) and the start of the msg string. Since the .data section immediately follows the msg string, this effectively calculates the length of the string.

So in summary, the .data section contains:

  • The actual string data we want to print

  • The length of that string

These are then referenced in the .text section code to actually perform the write() system call and print the string.

The key points are:

  • .data contains initialized data

  • .text contains executable code

  • We define symbols (msg and len) to refer to the string and length

  • We use directives (db and equ) to allocate space and define symbols.

Note: At this time, I do not expect you to understand it all. There are things not yet explained fully. However, you can see in this example the usage of write() using the console as a target.


files

In Linux assembly language, open() and close() are system calls used to open and close files.

The open() system call has the following format:

mov eax, 5 ; system call number 5 for open()
mov ebx, filename ; pointer to filename
mov ecx, flags ; open flags (O_RDONLY, O_WRONLY, O_CREAT)
mov edx, mode ; file permissions if creating
int 0x80 ; make kernel system call

This does the following:

  1. It loads the system call number 5 into eax, which corresponds to open().

  2. It loads the pointer to the filename (string) into ebx.

  3. It loads the open flags into ecx. This determines if the file is opened for reading, writing, or creating if it doesn't exist.

  4. It loads the file permissions into edx. This is only used if creating a new file.

  5. It executes the int 0x80 instruction to make the system call. The kernel then opens the file and returns a file descriptor.

  6. The returned file descriptor is placed in eax and can then be used for read(), write(), lseek(), and close() operations on that file.

The close() system call has a similar format:

mov eax, 6 ; system call number 6 for close()
mov ebx, fd ; file descriptor to close
int 0x80 ; make kernel system call

This simply closes the file referenced by the file descriptor in ebx.

So in summary, open() is used to open a file and get a file descriptor, which is then used for read/write operations. close() is used to close the file and release its resources when done.


Disclaim: This article is written with HashNode Rix AI. I study Assembly step by step. If you find errors or wish to know more, simple ask Rix yourself. There is a lot until I fully understand these things myself.