ASM Subprograms
Improve code re-usability an avoid repetitive code in Assembly.
Subprograms or subroutines are a useful concept in assembly language programming. They allow breaking down a large program into smaller, manageable parts. Some main types of subprograms in assembly language are:
Procedures: These are similar to functions in high-level languages. They accept input parameters, perform some tasks and return a value. They are called using a CALL instruction and return using a RET instruction. Parameters can be passed using registers or memory.
Subroutines: These are simply code segments that are called using CALL and returned using RET. They do not accept parameters or return values.
Macros: These are code snippets that are defined once but can be used multiple times in the program. They are expanded inline during assembly. Macros can accept parameters to make them more flexible.
Interrupt service routines: These are special subprograms that are executed when an interrupt occurs. They handle the interrupt, perform required tasks and return. They use the same CALL and RET instructions.
Using subprograms helps make assembly language code more modular, structured and reusable. It improves readability, and maintainability and reduces errors. Parameters are used to provide subprograms with the data they need to perform their tasks.
Procedures
Here is the syntax for a procedure in assembly language with a code block:
proc_name PROC
; Local variables
mov ax, 0
; Procedure body
proc_name ENDP
The main differences from C language are:
There are no function prototypes. The procedure is defined directly.
There are no return types. The result is returned using a register or memory location.
Parameters are passed using registers or memory, not variable names.
There are no data types for parameters. They are just values passed in registers.
There are no access modifiers like public, private etc. All procedures are accessible from anywhere.
There are no named parameters. Parameters are simply passed in order using registers or memory.
For example:
Add PROC
mov ax, 0
mov bx, [bp + 4] ; First parameter
mov cx, [bp + 6] ; Second parameter
add ax, bx
add ax, cx
ret
Add ENDP
Here we are accessing the first two parameters using offsets [bp + 4] and [bp + 6] from the base pointer register bp.
In C, it would be defined as:
int add(int a, int b) {
return a + b;
}
So in summary, procedures in assembly language are more low-level and primitive compared to functions in C. They provide a basic means of structuring code into subroutines.
Procedure Call
Here is an explanation of how to call a procedure, what the bp register is, and how to retrieve results in assembly language:
Calling a procedure: We call a procedure using the CALL instruction, passing the required parameters. For example:
mov dx, 10
mov bx, 20
call Add ; Call the Add procedure
This will call the Add procedure we defined earlier, passing 10 in dx and 20 in bx as the two parameters.
The bp register: The bp register stands for the base pointer. It contains the base address of the current procedure's stack frame.
The stack frame contains:
Parameters passed to the procedure
Local variables of the procedure
Return address (to return to the caller)
Saved registers
So the bp register allows us to access parameters and local variables using offsets, as we saw in the Add procedure where we accessed the parameters using [bp + 4] and [bp + 6].
Retrieving results: Results are retrieved using registers or memory locations. In the Add procedure, we returned the result in the ax register.
So after calling the procedure, we can access the result like this:
call Add
; Do something with the result stored in ax
mov bx, ax
add bx, 10
; bx now contains the sum plus 10
The ret instruction at the end of a procedure will pop the return address from the stack, allowing the program flow to continue from the caller.
Stack Frame
When working with procedures in Assembly, the stack plays an important role:
The stack is used to pass parameters to procedures and return values from procedures.
The stack is used to store the local variables and temporary values for a procedure.
The stack is used to store the return address so the procedure knows where to return to after it finishes executing.
This is all done using the stack frame for a procedure:
The stack frame contains:
Parameters passed to the procedure:
Parameters are pushed onto the stack in reverse order before the CALL instruction.
The procedure accesses the parameters using offsets from the bp register.
Local variables of the procedure:
The procedure allocates space on the stack for its local variables.
It accesses the local variables using offsets from the bp register.
Return address:
Before the CALL instruction, the return address is pushed onto the stack.
The ret instruction pops the return address, allowing the program flow to return to the caller.
Saved registers:
The procedure may push important registers onto the stack to save their values.
It pops these registers before returning to restore their original values.
So in summary, the stack and stack frame provide a way for procedures to:
Pass and access parameters
Allocate space for local variables
Save the return address
Save and restore register values
This allows procedures to execute independently with their own set of parameters, local variables, and register values. Once the procedure is done executing, the stack frame is deallocated and the stack pointer is restored.
The Stack
The stack is a region in memory that is used for:
Passing parameters to functions/procedures
Allocating space for local variables
Storing return addresses
Saving/restoring register values
When a function/procedure is called:
The parameters are pushed onto the stack in reverse order
The return address is pushed onto the stack
The base pointer (bp) register is updated to point to the bottom of the stack frame
The stack grows downward in memory, from higher addresses to lower addresses. This means that:
Parameters are accessed using positive offsets from the bp register
Local variables are accessed using negative offsets from the bp register
Physically, the stack is a region of memory (RAM) that is managed by the CPU. The stack pointer (sp) register points to the top of the stack, and is decremented to allocate space on the stack and incremented to deallocate/release space.
The stack is not cached in the CPU cache - it resides in main memory (RAM). The CPU manages accessing data from the stack in memory when needed.
So in summary:
The stack is a region of memory (RAM) managed by the CPU
Parameters are pushed onto the stack before a function call, and accessed using offsets from the bp register within the stack frame
The bp register points to the bottom of the current stack frame
The sp register points to the top of the stack and is used to allocate/deallocate space on the stack
Stack Features
The stack is a useful data structure that allows you to perform several operations. Some of the main operations you can do with a stack are:
Some common operations you can do with stacks in Assembly are:
- Push - Use PUSH instruction to push a value onto the stack. This decreases the stack pointer.
PUSH value
- Pop - Use POP instruction to pop a value off the stack. This increases the stack pointer.
POP variable
- Call - Use CALL instruction to call a subroutine. This pushes the return address onto the stack.
CALL subroutine
- Ret - Use RET instruction to return from a subroutine. This pops the return address from the stack.
RET
Some common use cases of stacks in Assembly are:
Function calls and returns: The stack is used to store the return address when a function is called, and that address is popped when the function returns.
Passing arguments: Arguments are pushed onto the stack before a function call, and popped by the callee function.
Recursive function calls: Each recursive call pushes a new stack frame, containing its return address and local variables.
Evaluating expressions: Operands can be pushed onto the stack, and operators popped to evaluate an expression.
Interrupt handling: When an interrupt occurs, the current CPU state is pushed to the stack. On return from interrupt, it is popped.
So in summary, the PUSH, POP, CALL and RET instructions are commonly used to manipulate the stack in Assembly. And the stack is useful for function calls, passing arguments, recursion, expression evaluation and interrupt handling.
The stack operations are very similar conceptually to higher-level languages, but the actual instructions used are specific to Assembly - PUSH, POP, CALL and RET.
Clarification
CALL and RET do not directly push or pop values to/from the stack themselves. They manipulate the stack pointer (SP register) to reserve space on the stack for parameters and local variables, and to restore the stack after a function returns.
The correct explanation is:
CALL - The CALL instruction decrements the stack pointer to make room for parameters and local variables, and then jumps to the specified subroutine. It does not directly push a value.
RET - The RET instruction pops the return address from the stack by incrementing the stack pointer, and then jumps to that address. It does not directly pop a value.
So in summary:
PUSH and POP instructions are used to actually push values onto the stack and pop values off the stack.
CALL and RET manipulate the stack pointer to reserve stack space and restore the stack, but they do not directly push or pop values themselves. They are used for function calls and returns.
Parameters
You make a good point. UsParametersing registers to pass parameters can avoid accessing the stack and memory in procedure calls, making the code more efficient.
In Assembly, there are two main ways to pass parameters to a procedure:
Push parameters onto the stack and pop them into registers in the callee procedure. This is the "stack model" of parameter passing.
Pass parameters directly in registers. This is the "register model" of parameter passing.
The register model is more efficient because:
It avoids pushing/popping the stack, which takes more clock cycles.
It avoids accessing memory to read the stack, which is slower than accessing registers.
So yes, if you have values already in registers, you can pass those registers directly as parameters to a procedure call, rather than pushing the values onto the stack.
For example:
CALL sum ; Call sum procedure
sum PROC
ADD eax, ebx ; Add parameters in eax and ebx registers
RET
sum ENDP
Here we are passing eax and ebx as parameters directly, without using the stack.
Pusing & Poping
SP (stack pointer) and EBP (base pointer) are the only registers that are automatically maintained by CALL and RET instructions in Assembly language.
SP is used to keep track of the stack, and is decremented by CALL and incremented by RET to adjust the stack. EBP is often used as a frame pointer and is pushed to the stack by CALL and popped by RET to establish and destroy stack frames.
No other registers are automatically maintained by CALL and RET. If the values in registers like EAX, EBX, ECX, etc. need to be preserved across the function call, they must be explicitly pushed to the stack before CALL and popped after RET.
So in summary:
Only SP and EBP are automatically maintained by CALL and RET.
All other registers (EAX, EBX, ECX, etc.) must be manually pushed/popped to preserve their values across function calls.
Example:
Here is an example of pushing 2 registers, making a function call, and then restoring the registers:
; Preserve EAX and EBX registers
push eax
push ebx
; Make function call
call someFunction
; Restore registers
pop ebx
pop eax
Here are the steps this code is performing:
- It is pushing the EAX and EBX registers onto the stack to preserve their values:
push eax
push ebx
- It is then making a call to the
someFunction
function:
call someFunction
- After the function call returns, it pops the registers back off the stack to restore their original values:
pop ebx
pop eax
This restores EAX first, then EBX. The order of popping matches the order of pushing.
So in summary, this code is:
Pushing 2 registers (EAX and EBX) to preserve their values
Making a function call
Popping the registers in reverse order to restore their original values
Hope this example helps clarify how to preserve registers across function calls!
Multiple Results
There are a few ways you can return multiple results from a procedure in Assembly:
Return values in registers: The function can return up to 2-3 values in registers like EAX, EDX, and ECX. The caller then uses those register values after the function returns.
Return values on the stack: The function can push multiple values onto the stack, and the caller then pops those values off the stack after the function returns.
Pass a pointer to a buffer: The function can be passed a pointer to a buffer, and it will store multiple results in that buffer. The caller then accesses the buffer after the function returns.
Example
An example of returning 2 values in registers:
someProcedure:
; .. code
mov eax, result1
mov edx, result2
ret
caller:
call someProcedure
; Use result1 from EAX
; Use result2 from EDX
If the caller does not consume all the results, the unused results will simply remain in their registers or memory locations. This could cause issues if:
The registers are overwritten by subsequent code
The memory locations are overwritten
So it's good practice for the caller to always consume all results returned from a function, to avoid unexpected behavior.
In summary:
Functions can return multiple results in registers, on the stack, or in a passed buffer
The caller needs to consume all results to avoid issues
Unused results may remain in registers or memory, risking being overwritten
Buffer
A buffer is a block of memory used to temporarily store data while it is being moved from one place to another.
In Assembly, you can allocate a buffer in memory using the ALLOCATE directive. This will reserve a contiguous block of memory of the specified size. For example:
ALLOCATE buffer 100 ; Allocate a 100 byte buffer
This will allocate a 100 byte buffer and return a pointer to that buffer in EAX. You can then store data in that buffer using that pointer.
As for where the buffer is actually located in memory, it depends:
For small buffers, they are likely allocated on the stack. This is the fastest memory, but is limited in size.
For larger buffers, they are allocated in the program data segment (also called .data or .bss section). This is main memory, which is slower but has more space.
The buffer may also end up in the CPU caches (L1, L2, etc.) if the memory is accessed frequently enough to be cached. This makes subsequent accesses to that data faster.
So in summary:
You allocate a buffer in Assembly using the ALLOCATE directive
It returns a pointer to the buffer in a register (EAX)
Small buffers are likely on the stack, large buffers in main memory (.data section)
Frequently accessed buffers may end up in the CPU caches for faster access
Example:
Here is an example of how to return 2 results using a buffer in a standard Assembly procedure:
; Define buffer size
BUFFER_SIZE equ 200
Procedure proc
; Allocate buffer on stack
push ebp
mov ebp, esp
sub esp, BUFFER_SIZE
; Calculate two results
mov eax, 10 ; Result 1
mov edx, 20 ; Result 2
; Store results in buffer
mov [ebp-4], eax
mov [ebp-2], edx
; Restore stack frame
mov esp, ebp
pop ebp
ret
Procedure endp
main:
; Call procedure
call Procedure
; Access results from buffer
mov ax, [ebp-4] ; Result 1 in AX
mov dx, [ebp-2] ; Result 2 in DX
This code does the following:
It defines a 200 byte buffer size
It defines a standard Assembly procedure using
Procedure proc
andProcedure endp
Inside the procedure, it allocates space on the stack for the buffer
It calculates two results and stores them in EAX and EDX
It stores the results at specific offsets within the buffer
It restores the stack frame before returning
In the main code, it calls the procedure
It accesses the two returned results from the buffer
The key parts are:
Defining a standard Assembly procedure using PROC and ENDP
Allocating space on the stack for the buffer within the procedure
Storing the multiple return values at specific offsets in the buffer
Accessing the buffer after the procedure returns to get the results
So this demonstrates returning multiple values from a procedure using a buffer in a standard Assembly procedure definition. Let me know if you have any questions!
Subroutines
Subroutines are similar to procedures. The main differences are:
Procedures are typically used for modularity and code reuse, while subroutines are mainly used to transfer control of execution.
Procedures typically have input parameters and return values, while subroutines do not necessarily have parameters or return values. They are simply used to jump to a block of code and execute it.
Procedures are typically defined using PROC and ENDP directives, while subroutines are simply defined with a label.
For example, a subroutine in x86 Assembly may be defined like this:
mySubroutine:
; Subroutine code goes here...
ret ; Return to caller
; Call the subroutine
call mySubroutine
As you can see, it's simply a label defining the start of the code block, followed by a RET instruction to return to the caller.
Subroutines are useful for:
Jumping to a block of code to execute it conditionally
Reducing repetitive code
Improving readability by giving the code block a name
However, since subroutines typically do not have input parameters or return values, they are more limited than procedures.
So in summary, while subroutines exist in x86 Assembly, they:
Are defined with a simple label
Typically do not have input parameters or return values
Are mainly used for conditional jumping and reducing repetitive code
Procedures are more robust and useful since they:
Are defined with PROC/ENDP directives
Typically have input parameters and return values
Enable modularity and code reuse
Hope this helps explain the difference between subroutines and procedures in x86 Assembly! Let me know if you have any other questions.
A subroutine will be executed when it is encountered in a sequence of code, not skipped even if is not called. This is the main difference between a real procedure and a subroutine. A subroutine is executed at least one time.
A subroutine is essentially a labeled block of code that can be called from other parts of the program using the CALL instruction. When the CALL instruction is encountered, program execution jumps to the label that defines the start of the subroutine.
The subroutine's code is then executed until a RET instruction is encountered, which returns execution back to the instruction after the CALL.
For example:
main:
call mySubroutine ; Call the subroutine
mySubroutine:
; Subroutine code
ret ; Return to main
Here when the CALL mySubroutine instruction is encountered in the main code, program execution will jump to the mySubroutine label and begin executing that code. It will not skip the subroutine.
So in summary, when a subroutine is encountered:
The CALL instruction causes a jump to the label that defines the subroutine
The subroutine's code is then executed
The RET instruction returns execution back to the instruction after the CALL
The subroutine is not skipped - it is executed when encountered in the code sequence. The CALL instruction transfers control of execution to the subroutine.
Macros
Macros are a very useful feature of Assembly language that allow you to define reusable code snippets. They work by doing text substitution before the Assembly code is compiled.
Some benefits of using Assembly macros are:
Code reuse: Macros allow you to define a code snippet once and reuse it multiple times in your program. This reduces repetition and makes the code more maintainable.
Reduced size: Since the macro is substituted with its body, the final assembled code is smaller than if you wrote the code multiple times.
Improved readability: Macros give meaningful names to blocks of code, making the Assembly more readable.
Here is an example of a simple Assembly macro:
; Define the macro
PrintString MACRO string
MOV DX,OFFSET string
MOV AH,9
INT 21h
ENDM
; Use the macro
PrintString "Hello"
PrintString "World!"
; The above code will expand to:
MOV DX,OFFSET "Hello"
MOV AH,9
INT 21h
MOV DX,OFFSET "World!"
MOV AH,9
INT 21h
As you can see, the macro PrintString is defined once but used twice. During assembly, the macro is expanded inline, substituting the string argument each time.
Macros can also have multiple arguments:
Print MACRO msg, num
MOV AH, msg
MOV AL, num
INT 21h
ENDM
; Use the macro
Print 9, "A"
Print 2, "B"
This will expand to:
MOV AH, 9
MOV AL, "A"
INT 21h
MOV AH, 2
MOV AL, "B"
INT 21h
So in summary, macros allow you to:
Define reusable code snippets
Pass arguments to the macro body
Expand inline during assembly for code reuse
Interrupt service routines
Interrupt service routines (ISRs) are functions written in Assembly that handle hardware interrupts. They allow the CPU to respond to events generated by devices and peripherals.
When an interrupt is triggered by a device, the CPU suspends its current execution, saves the context, and jumps to the ISR corresponding to that interrupt. The ISR then handles the interrupt and returns control back to the code that was executed before the interrupt occurred.
Some key points about ISRs:
They are functions written in Assembly language. This is because interrupts suspend the current execution, so high-level languages cannot handle the context switch.
They are assigned a unique interrupt number, corresponding to the interrupt source. For example, interrupt 0 is for divide-by-zero, 1 is for debug exception, etc. Device interrupts start from 32 and above.
ISRs typically save the CPU context by pushing registers onto the stack. This includes saving the instruction pointer, flags register, segment registers, etc.
They perform the required task to handle the interrupt. For device interrupts, this may involve reading from I/O ports, updating flags, etc.
They restore the CPU context before returning. This involves popping the saved registers from the stack.
They end by executing an IRET instruction, which returns control to the interrupted code.
A simple ISR in Assembly may look like this:
ISR_Handle:
pusha ; Save registers
; ISR code goes here...
popa ; Restore registers
iret ; Return from interrupt
So in summary, interrupt service routines are essential Assembly functions that allow the CPU to respond to and handle hardware interrupts in a timely manner. They save and restore the CPU context before and after handling the interrupt.
Disclaim: I have done my best to squiz information from Rix. He has done some mistakes, I try to explain these mistakes to it and he pretend to learn. However I think the article is informative and I have learned a lot.