Ask Sawal

Discussion Forum
Notification Icon1
Write Answer Icon
Add Question Icon

What is rsp register?

6 Answer(s) Available
Answer # 1 #

If you missed the previous chapters, you should probably start there:

As we have seen in chapter 2, the stack resides at the high end of memory and grows downward. But how does it work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.

Once we understand this, we will be able to play a bit with it, and hijack the flow of our program. Ready? Let’s start!

Note: We will talk only about the user stack, as opposed to the kernel stack

In order to fully understand this article, you will need to know:

All scripts and programs have been tested on the following system:

Everything we cover will be true for this system/environment, but may be different on another system

Let’s first look at a very simple program that has one function that uses one variable (0-main.c):

Let’s compile this program and disassemble it using objdump:

The assembly code produced for our main function is the following:

Let’s focus on the first three lines for now:

The first lines of the function main refers to rbp and rsp; these are special purpose registers. rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the top of the current stack frame.

Let’s decompose step by step what is happening here. This is the state of the stack when we enter the function main before the first instruction is run:

We have just created a space in memory – on the stack – for our local variables. This space is called a stack frame. Every function that has local variables will use a stack frame to store those variables.

The fourth line of assembly code of our main function is the following:

0x3cc is actually the value 972 in hexadecimal. This line corresponds to our C-code line:

mov DWORD PTR ,0x3cc is setting the memory at address rbp - 4 to 972. IS our local variable a. The computer doesn’t actually know the name of the variable we use in our code, it simply refers to memory addresses on the stack.

This is the state of the stack and the registers after this operation:

If we look now at the end of the function, we will find this:

The instruction leave sets rsp to rbp, and then pops the top of the stack into rbp.

Because we pushed the previous value of rbp onto the stack when we entered the function, rbp is now set to the previous value of rbp. This is how:

The state of the stack and the registers rbp and rsp are restored to the same state as when we entered our main function.

When the variables are automatically de-allocated from the stack, they are not completely “destroyed”. Their values are still in memory, and this space will potentially be used by other functions.

This is why it is important to initialize your variables when you write your code, because otherwise, they will take whatever value there is on the stack at the moment when the program is running.

Let’s consider the following C code (1-main.c):

As you can see, func2 does not set the values of its local vaiables a, b and c, yet if we compile and run this program it will print…

… the same variable values of func1! This is because of how the stack works. The two functions declared the same amount of variables, with the same type, in the same order. Their stack frames are exactly the same. When func1 ends, the memory where the values of its local variables reside are not cleared – only rsp is incremented. As a consequence, when we call func2 its stack frame sits at exactly the same place of the previous func1 stack frame, and the local variables of func2 have the same values of the local variables of func1 when we left func1.

Let’s examine the assembly code to prove it:

As you can see, the way the stack frame is formed is always consistent. In our two functions, the size of the stack frame is the same since the local variables are the same.

And both functions end with the leave statement.

The variables a, b and c are referenced the same way in the two functions:

Note that the order of those variables on the stack is not the same as the order of those variables in our code. The compiler orders them as it wants, so you should never assume the order of your local variables in the stack.

So, this is the state of the stack and the registers rbp and rsp before we leave func1:

When we leave the function func1, we hit the instruction leave; as previously explained, this is the state of the stack, rbp and rsp right before returning to the function main:

So when we enter func2, the local variables are set to whatever sits in memory on the stack, and that is why their values are the same as the local variables of the function func1.

You might have noticed that all our example functions end with the instruction ret. ret pops the return address from stack and jumps there. When functions are called the program uses the instruction call to push the return address before it jumps to the first instruction of the function called. This is how the program is able to call a function and then return from said function the calling function to execute its next instruction.

So this means that there are more than just variables on the stack, there are also memory addresses of instructions. Let’s revisit our 1-main.c code.

When the main function calls func1,

it pushes the memory address of the next instruction onto the stack, and then jumps to func1. As a consequence, before executing any instructions in func1, the top of the stack contains this address, so rsp points to this value.

After the stack frame of func1 is formed, the stack looks like this:

Given what we just learned, we can directly use rbp to directly access all our local variables (without using the C variables!), as well as the saved rbp value on the stack and the return address values of our functions.

To do so in C, we can use:

Here is the listing of the program 2-main.c:

From our previous discoveries, we know that our variables are referenced via rbp – 0xX:

So in order to get the values of those variables, we need to dereference rbp. For the variable a:

Looking at the above diagram, the current rbp directly points to the saved rbp, so we simply have to cast our variable rbp to a pointer to an unsigned long int and dereference it: *(unsigned long int *)rbp.

The return address value is right before the saved previous rbp on the stack. rbp is 8 bytes long, so we simply need to add 8 to the current value of rbp to get the address where this return value is on the stack. This is how we do it:

We can see that:

The return address from func1 is 0x400697. Let’s double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right after the call of func1 in the main function.

And yes! \o/

Now that we know where to find the return address on the stack, what if we were to modify this value? Could we alter the flow of a program and make func1 return to somewhere else? Let’s add a new function, called bye to our program (3-main.c):

Let’s see at which address the code of this function starts:

Now let’s replace the return address on the stack from the func1 function with the address of the beginning of the function bye, 4005bd (4-main.c):

We have called the function bye, without calling it!

I hope that you enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you have anything you would like me to cover in the next chapter.

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42. Haters, please send your comments to /dev/null.

Happy Hacking!

As always, no one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

This repo contains the source code (X-main.c files) for programs created in this tutorial.

Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the fifth chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them:

[4]
Edit
Query
Report
Jean-Paul Mescudi
Music Director
Answer # 2 #

Abstraction layers are great tools for building things, but they can sometimes get in the way of learning. My goal in this post is to convince you that in order to rigorously understand C, we must also understand the assembly that our C compiler generates. I’ll do this by showing you how to disassemble and read a simple program with GDB, and then we’ll use GDB and our knowledge of assembly to understand how static local variables work in C.

Note: All the code in this post was compiled on an x86_64 CPU running Mac OS X 10.8.1 using Clang 4.0 with optimizations disabled (-O0).

Let’s start by disassembling a program with GDB and learning how to read the output. Type the following program into a text file and save it as simple.c:

Now compile it with debugging symbols and no optimizations and then run GDB:1

Inside GDB, we’ll break on main and run until we get to the return statement. We put the number 2 after next to specify that we want to run next twice:

Now let’s use the disassemble command to show the assembly instructions for the current function. You can also pass a function name to disassemble to specify a different function to examine.

The disassemble command defaults to outputting instructions in AT&T syntax, which is the same syntax used by the GNU assembler.2 Instructions in AT&T syntax are of the format mnemonic source, destination. The mnemonic is a human readable name for the instruction. Source and destination are operands and can be immediate values, registers, memory addresses, or labels. Immediate values are constants, and are prefixed by a $. For instance, $0x5 represents the number 5 in hexadecimal. Register names are prefixed by a %.

It’s worth taking a quick detour to understand registers. Registers are data storage locations directly on the CPU. With some exceptions, the size, or width, of a CPU’s registers define its architecture. So if you have a 64-bit CPU, your registers will be 64 bits wide. The same is true of 32-bit CPUs (32-bit registers), 16-bit CPUs, and so on.3 Registers are very fast to access and are often the operands for arithmetic and logic operations.

The x86 family has a number of general and special purpose registers. General purpose registers can be used for any operation and their value has no particular meaning to the CPU. On the other hand, the CPU relies on special purpose registers for its own operation and the values stored in them have a specific meaning depending on the register. In our example above, %eax and %ecx are general purpose registers, while %rbp and %rsp are special purpose registers. %rbp is the base pointer, which points to the base of the current stack frame, and %rsp is the stack pointer, which points to the top of the current stack frame. %rbp always has a higher value than %rsp because the stack starts at a high memory address and grows downwards. If you are unfamiliar with the call stack, you can find a good introduction on Wikipedia.

One quirk of the x86 family is that it has maintained backwards compatibility all the way back to the 16-bit 8086 processor. As x86 moved from 16-bit to 32-bit to 64-bit, the registers were expanded and given new names so as to not break backwards compatibility with code that was written for older, narrower CPUs.

Take the general purpose register AX, which is 16 bits wide. The high byte can be accessed with the name AH, and the low byte with the name AL. When the 32-bit 80386 came out, the Extended AX register, or EAX, referred to the 32-bit register, while AX continued to refer to a 16-bit register that made up the lower half of EAX. Similarly, when the x86_64 architecture came out, the “R” prefix was used and EAX made up the lower half of the 64-bit RAX register. I’ve included a diagram below based on a Wikipedia article to help visualize the relationships I described:

This should be enough information to start reading our disassembled program.

The first two instructions are called the function prologue or preamble. First we push the old base pointer onto the stack to save it for later. Then we copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main‘s stack frame.

This instruction copies 0 into %eax. The x86 calling convention dictates that a function’s return value is stored in %eax, so the above instruction sets us up to return 0 at the end of our function.

Here we have something we haven’t encountered before: -0x4(%rbp). The parentheses let us know that this is a memory address. Here, %rbp is called the base register, and -0x4 is the displacement. This is equivalent to %rbp + -0x4. Because the stack grows downwards, subtracting 4 from the base of the current stack frame moves us into the current frame itself, where local variables are stored. This means that this instruction stores 0 at %rbp - 4. It took me a while to figure out what this line was for, but it seems that clang allocates a hidden local variable for an implicit return value from main.

You’ll also notice that the mnemonic has the suffix l. This signifies that the operands will be long (32 bits for integers). Other valid suffixes are byte, short, word, quad, and ten. If you see an instruction that does not have a suffix, the size of the operands are inferred from the size of the source or destination register. For instance, in the previous line, %eax is 32 bits wide, so the mov instruction is inferred to be movl.

Now we’re getting into the meat of our sample program! The first line of assembly is the first line of C in main and stores the number 5 in the next available local variable slot (%rbp - 0x8), 4 bytes down from our last local variable. That’s the location of a. We can use GDB to verify this:

Note that the memory addresses are the same. You’ll notice that GDB sets up variables for our registers, but like all variables in GDB, we prefix it with a $ rather than the % used in AT&T assembly.

We then move a into %ecx, one of our general purpose registers, add 6 to it and store the result in %rbp - 0xc. This is the second line of C in main. You’ve maybe figured out that %rbp - 0xc is b, which we can verify in GDB:

The rest of main is just cleanup, called the function epilogue:

We pop the old base pointer off the stack and store it back in %rbp and then retq jumps back to our return address, which is also stored in the stack frame.

So far we’ve used GDB to disassemble a short C program, gone over how to read AT&T assembly syntax, and covered registers and memory address operands. We’ve also used GDB to verify where our local variables are stored in relation to %rbp. Now we’re going to use our newly acquired skills to explain how static local variables work.

Static local variables are a very cool feature of C. In a nutshell, they are local variables that only get initialized once and persist their values across multiple calls to the function where they are defined. A simple use case for static local variables is a Python-style generator. Here’s one that generates all of the natural numbers up to INT_MAX:

When compiled and run, this program prints the first three natural numbers:

But how does this work? To understand static locals, we’re going to jump into GDB and look at the assembly. I’ve removed the address information that GDB adds to the disassembly so that everything fits on screen:

The first thing we need to do is figure out what instruction we’re on. We can do that by examining the instruction pointer or program counter. The instruction pointer is a register that stores the memory address of the next instruction. On x86_64, that register is %rip. We can access the instruction pointer using the $rip variable, or alternatively we can use the architecture independent $pc:

The instruction pointer always contains the address of the next instruction to be run, which means the third instruction hasn’t been run yet, but is about to be.

Because knowing the next instruction is useful, we’re going to make GDB show us the next instruction every time the program stops. In GDB 7.0 or later, you can just run set disassemble-next-line on, which shows all the instructions that make up the next line of source, but we’re using Mac OS X, which only ships with GDB 6.3, so we’ll have to resort to the display command. display is like x, except it evaluates its expression every time our program stops:

Now GDB is set up to always show us the next instruction before showing its prompt.

We’re already past the function prologue, which we covered earlier, so we’ll start right at the third instruction. This corresponds to the first source line that assigns 1 to a. Instead of next, which moves to the next source line, we’ll use nexti, which moves to the next assembly instruction. Afterwards we’ll examine %rbp - 0x4 to verify our hypothesis that a is stored at %rbp - 0x4.

They are the same, just as we expected. The next instruction is more interesting:

This is where we’d expect to find the line static int b = -1;, but it looks substantially different than anything we’ve seen before. For one thing, there’s no reference to the stack frame where we’d normally expect to find local variables. There’s not even a -0x1! Instead, we have an instruction that loads 0x100001018, located somewhere after the instruction pointer, into %eax. GDB gives us a helpful comment with the result of the memory operand calculation and a hint telling us that natural_generator.b is stored at this address. Let’s run this instruction and figure out what’s going on:

Even though the disassembly shows %eax as the destination, we print $rax, because GDB only sets up variables for full width registers.

In this situation, we need to remember that while variables have types that specify if they are signed or unsigned, registers don’t, so GDB is printing the value of %rax unsigned. Let’s try again, by casting %rax to a signed int:

It looks like we’ve found b. We can double check this by using the x command:

So not only is b stored at a low memory address outside of the stack, it’s also initialized to -1 before natural_generator is even called. In fact, even if you disassembled the entire program, you wouldn’t find any code that sets b to -1. This is because the value for b is hardcoded in a different section of the sample executable, and it’s loaded into memory along with all the machine code by the operating system’s loader when the process is launched.4

With this out of the way, things start to make more sense. After storing b in %eax, we move to the next line of source where we increment b. This corresponds to the next two instructions:

Here we add 1 to %eax and store the result back into memory. Let’s run these instructions and verify the result:

The next two instructions set us up to return a + b:

Here we load a into %eax and then add b. At this point, we’d expect %eax to be 1. Let’s verify:

%eax is used to store the return value from natural_generator, so we’re all set up for the epilogue which cleans up the stack and returns:

Now we understand how b is initialized, let’s see what happens when we run natural_generator again:

Because b is not stored on the stack with other local variables, it’s still zero when natural_generator is called again. No matter how many times our generator is called, b will always retain its previous value. This is because it’s stored outside the stack and initialized when the loader moves the program into memory, rather than by any of our machine code.

We began by going over how to read assembly and how to disassemble a program with GDB. Afterwards, we covered how static local variables work, which we could not have done without disassembling our executable.

We spent a lot of time alternating between reading the assembly instructions and verifying our hypotheses in GDB. It may seem repetitive, but there’s a very important reason for doing things this way: the best way to learn something abstract is to make it more concrete, and one of the best way to make something more concrete is to use tools that let you peel back layers of abstraction. The best way to to learn these tools is to force yourself to use them until they’re second nature.

[2]
Edit
Query
Report
Jackson Verma
WEATHER CLERK
Answer # 3 #

Abstraction layers are great tools for building things, but they can sometimes get in the way of learning. My goal in this post is to convince you that in order to rigorously understand C, we must also understand the assembly that our C compiler generates. I’ll do this by showing you how to disassemble and read a simple program with GDB, and then we’ll use GDB and our knowledge of assembly to understand how static local variables work in C.

Note: All the code in this post was compiled on an x86_64 CPU running Mac OS X 10.8.1 using Clang 4.0 with optimizations disabled (-O0).

Let’s start by disassembling a program with GDB and learning how to read the output. Type the following program into a text file and save it as simple.c:

Now compile it with debugging symbols and no optimizations and then run GDB:1

Inside GDB, we’ll break on main and run until we get to the return statement. We put the number 2 after next to specify that we want to run next twice:

Now let’s use the disassemble command to show the assembly instructions for the current function. You can also pass a function name to disassemble to specify a different function to examine.

The disassemble command defaults to outputting instructions in AT&T syntax, which is the same syntax used by the GNU assembler.2 Instructions in AT&T syntax are of the format mnemonic source, destination. The mnemonic is a human readable name for the instruction. Source and destination are operands and can be immediate values, registers, memory addresses, or labels. Immediate values are constants, and are prefixed by a $. For instance, $0x5 represents the number 5 in hexadecimal. Register names are prefixed by a %.

It’s worth taking a quick detour to understand registers. Registers are data storage locations directly on the CPU. With some exceptions, the size, or width, of a CPU’s registers define its architecture. So if you have a 64-bit CPU, your registers will be 64 bits wide. The same is true of 32-bit CPUs (32-bit registers), 16-bit CPUs, and so on.3 Registers are very fast to access and are often the operands for arithmetic and logic operations.

The x86 family has a number of general and special purpose registers. General purpose registers can be used for any operation and their value has no particular meaning to the CPU. On the other hand, the CPU relies on special purpose registers for its own operation and the values stored in them have a specific meaning depending on the register. In our example above, %eax and %ecx are general purpose registers, while %rbp and %rsp are special purpose registers. %rbp is the base pointer, which points to the base of the current stack frame, and %rsp is the stack pointer, which points to the top of the current stack frame. %rbp always has a higher value than %rsp because the stack starts at a high memory address and grows downwards. If you are unfamiliar with the call stack, you can find a good introduction on Wikipedia.

One quirk of the x86 family is that it has maintained backwards compatibility all the way back to the 16-bit 8086 processor. As x86 moved from 16-bit to 32-bit to 64-bit, the registers were expanded and given new names so as to not break backwards compatibility with code that was written for older, narrower CPUs.

Take the general purpose register AX, which is 16 bits wide. The high byte can be accessed with the name AH, and the low byte with the name AL. When the 32-bit 80386 came out, the Extended AX register, or EAX, referred to the 32-bit register, while AX continued to refer to a 16-bit register that made up the lower half of EAX. Similarly, when the x86_64 architecture came out, the “R” prefix was used and EAX made up the lower half of the 64-bit RAX register. I’ve included a diagram below based on a Wikipedia article to help visualize the relationships I described:

This should be enough information to start reading our disassembled program.

The first two instructions are called the function prologue or preamble. First we push the old base pointer onto the stack to save it for later. Then we copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main‘s stack frame.

This instruction copies 0 into %eax. The x86 calling convention dictates that a function’s return value is stored in %eax, so the above instruction sets us up to return 0 at the end of our function.

Here we have something we haven’t encountered before: -0x4(%rbp). The parentheses let us know that this is a memory address. Here, %rbp is called the base register, and -0x4 is the displacement. This is equivalent to %rbp + -0x4. Because the stack grows downwards, subtracting 4 from the base of the current stack frame moves us into the current frame itself, where local variables are stored. This means that this instruction stores 0 at %rbp - 4. It took me a while to figure out what this line was for, but it seems that clang allocates a hidden local variable for an implicit return value from main.

You’ll also notice that the mnemonic has the suffix l. This signifies that the operands will be long (32 bits for integers). Other valid suffixes are byte, short, word, quad, and ten. If you see an instruction that does not have a suffix, the size of the operands are inferred from the size of the source or destination register. For instance, in the previous line, %eax is 32 bits wide, so the mov instruction is inferred to be movl.

Now we’re getting into the meat of our sample program! The first line of assembly is the first line of C in main and stores the number 5 in the next available local variable slot (%rbp - 0x8), 4 bytes down from our last local variable. That’s the location of a. We can use GDB to verify this:

Note that the memory addresses are the same. You’ll notice that GDB sets up variables for our registers, but like all variables in GDB, we prefix it with a $ rather than the % used in AT&T assembly.

We then move a into %ecx, one of our general purpose registers, add 6 to it and store the result in %rbp - 0xc. This is the second line of C in main. You’ve maybe figured out that %rbp - 0xc is b, which we can verify in GDB:

The rest of main is just cleanup, called the function epilogue:

We pop the old base pointer off the stack and store it back in %rbp and then retq jumps back to our return address, which is also stored in the stack frame.

So far we’ve used GDB to disassemble a short C program, gone over how to read AT&T assembly syntax, and covered registers and memory address operands. We’ve also used GDB to verify where our local variables are stored in relation to %rbp. Now we’re going to use our newly acquired skills to explain how static local variables work.

Static local variables are a very cool feature of C. In a nutshell, they are local variables that only get initialized once and persist their values across multiple calls to the function where they are defined. A simple use case for static local variables is a Python-style generator. Here’s one that generates all of the natural numbers up to INT_MAX:

When compiled and run, this program prints the first three natural numbers:

But how does this work? To understand static locals, we’re going to jump into GDB and look at the assembly. I’ve removed the address information that GDB adds to the disassembly so that everything fits on screen:

The first thing we need to do is figure out what instruction we’re on. We can do that by examining the instruction pointer or program counter. The instruction pointer is a register that stores the memory address of the next instruction. On x86_64, that register is %rip. We can access the instruction pointer using the $rip variable, or alternatively we can use the architecture independent $pc:

The instruction pointer always contains the address of the next instruction to be run, which means the third instruction hasn’t been run yet, but is about to be.

Because knowing the next instruction is useful, we’re going to make GDB show us the next instruction every time the program stops. In GDB 7.0 or later, you can just run set disassemble-next-line on, which shows all the instructions that make up the next line of source, but we’re using Mac OS X, which only ships with GDB 6.3, so we’ll have to resort to the display command. display is like x, except it evaluates its expression every time our program stops:

Now GDB is set up to always show us the next instruction before showing its prompt.

We’re already past the function prologue, which we covered earlier, so we’ll start right at the third instruction. This corresponds to the first source line that assigns 1 to a. Instead of next, which moves to the next source line, we’ll use nexti, which moves to the next assembly instruction. Afterwards we’ll examine %rbp - 0x4 to verify our hypothesis that a is stored at %rbp - 0x4.

They are the same, just as we expected. The next instruction is more interesting:

This is where we’d expect to find the line static int b = -1;, but it looks substantially different than anything we’ve seen before. For one thing, there’s no reference to the stack frame where we’d normally expect to find local variables. There’s not even a -0x1! Instead, we have an instruction that loads 0x100001018, located somewhere after the instruction pointer, into %eax. GDB gives us a helpful comment with the result of the memory operand calculation and a hint telling us that natural_generator.b is stored at this address. Let’s run this instruction and figure out what’s going on:

Even though the disassembly shows %eax as the destination, we print $rax, because GDB only sets up variables for full width registers.

In this situation, we need to remember that while variables have types that specify if they are signed or unsigned, registers don’t, so GDB is printing the value of %rax unsigned. Let’s try again, by casting %rax to a signed int:

It looks like we’ve found b. We can double check this by using the x command:

So not only is b stored at a low memory address outside of the stack, it’s also initialized to -1 before natural_generator is even called. In fact, even if you disassembled the entire program, you wouldn’t find any code that sets b to -1. This is because the value for b is hardcoded in a different section of the sample executable, and it’s loaded into memory along with all the machine code by the operating system’s loader when the process is launched.4

With this out of the way, things start to make more sense. After storing b in %eax, we move to the next line of source where we increment b. This corresponds to the next two instructions:

Here we add 1 to %eax and store the result back into memory. Let’s run these instructions and verify the result:

The next two instructions set us up to return a + b:

Here we load a into %eax and then add b. At this point, we’d expect %eax to be 1. Let’s verify:

%eax is used to store the return value from natural_generator, so we’re all set up for the epilogue which cleans up the stack and returns:

Now we understand how b is initialized, let’s see what happens when we run natural_generator again:

Because b is not stored on the stack with other local variables, it’s still zero when natural_generator is called again. No matter how many times our generator is called, b will always retain its previous value. This is because it’s stored outside the stack and initialized when the loader moves the program into memory, rather than by any of our machine code.

[2]
Edit
Query
Report
Nandu Sarah
STEWARD STEWARDESS SECOND
Answer # 4 #

The %rsp register is called the stack pointer. It always points to the "top" of the stack, which is at the lowest (leftmost) address current used in the stack segment. At the start of the function, any memory to the left of where %rsp points is therefore unused; any memory to the right of where it points is used.

[2]
Edit
Query
Report
Bella Feitshans
Fine Artist
Answer # 5 #

In computer architecture, it's common for CPUs to internally handle integer registers / instructions separately from FP/SIMD registers / instructions. e.g. Intel Sandybridge-family CPUs have separate physical register files for renaming GP integer vs. FP/vector registers. These are simply called the integer vs. FP register files. (Where FP is short-hand for everything that a kernel doesn't need to save/restore to use the GP registers while leaving user-space's FPU/SIMD state untouched.) Each entry in the FP register file is 256 bits wide (to hold an AVX ymm vector), but integer register file entries only have to be 64 bits wide1.

But when we say "integer register", we normally mean specifically a general-purpose register.

Note 1: Actually, a typical design is for integer PRF entries have room for a FLAGS result and/or a GP register, so maybe 70 bits. Since integer instructions also write FLAGS, it makes sense to keep them together instead of allocating from a separate table of tiny registers. (The register allocation table would then just have 2 extra entries, one for CF and one for the rest of the FLAGS, the SPAZO group, to record which PRF entry each part comes from.) On CPUs that rename segment registers (Skylake does not), I guess those would go in an integer PRF entry.

As far as the integer part of the architectural state of a user-space task that a kernel would save/restore on interrupts and system calls, that would include its RFLAGS and RIP. (And usually just not touch FP state.)

"General purpose" in this usage means "data or address", as opposed to an ISA like m68k where you had d0..7 data regs and a0..7 address regs, all 16 of which are integer regs. Regardless of how the register is normally used, general-purpose is about how it can be used.

Every register has some special-ness for some instructions, except some of the completely new registers added with x86-64: R8-R15. These don't disqualify them as General Purpose The (low 16 of the) original 8 date back to 8086, and there were implicit uses of each of them even in the original 8086.

For RSP, it's special for push/pop/call/ret, so most code never uses it for anything else. (And in kernel mode, used asynchronously for interrupts, so you really can't stash it somewhere to get an extra GP register the way you can in user-space code: Is ESP as general-purpose as EAX?)

But in controlled conditional (like no signal handlers) you don't have to use RSP for a stack pointer. e.g. you can use it to read an array in a loop with pop, like in this code-golf answer. (I actually used esp in 32-bit code, but same difference: pop is faster than lodsd on Skylake, while both are 1 byte.)

See also x86 Assembly - Why is bx preserved in calling conventions? for a partial list.

I'm mostly limiting this to user-space instructions, especially ones a modern compiler might actually emit from C or C++ code. I'm not trying to be exhaustive for regs that have a lot of implicit uses.

Addressing-mode encoding special cases:

(See also rbp not allowed as SIB base? which is just about addressing modes, where I copied this part of this answer.)

rbp/r13 can't be a base register with no displacement: that encoding instead means: (in ModRM) rel32 (RIP-relative), or (in SIB) disp32 with no base register. (r13 uses the same 3 bits in ModRM/SIB, so this choice simplifies decoding by not making the instruction-length decoder look at the REX.B bit to get the 4th base-register bit). assembles to . assembles to (avoiding the problem by swapping base/index when that's an option).

rsp/r12 as a base register always needs a SIB byte. (The ModR/M encoding of base=RSP is escape code to signal a SIB byte, and again, more of the decoder would have to care about the REX prefix if r12 was handled differently).

rsp can't be an index register. This makes it possible to encode , which is more useful than . (Intel could have designed the ModRM/SIB encodings for 32-bit addressing modes (new in 386) so SIB-with-no-index was only possible with base=ESP. That would make possible and only exclude . But that's not useful, so they simplified the hardware by making index=ESP the code for no index regardless of the base. This allows two redundant ways to encode any base or base+disp addressing mode: with or without a SIB.)

[1]
Edit
Query
Report
Tahar animation:
Chief Revenue Officer
Answer # 6 #

A CS107 joint staff effort (Erik, Julie, Nate)

x86-64 (also known as just x64 and/or AMD64) is the 64-bit version of the x86/IA32 instruction set. Below is our overview of its features that are relevant to CS107. There is more extensive coverage on these topics in Chapter 3 of the B&O textbook. See also our x86-64 sheet for a compact one-page reference.

The table below lists the commonly used registers (sixteen general-purpose plus two special). Each register is 64 bits wide; the lower 32-, 16- and 8-bit portions are selectable by a pseudo-register name. Some registers are designated for a certain purpose, such as %rsp being used as the stack pointer or %rax for the return value from a function. Other registers are all-purpose, but have a conventional use depending on whether caller-saved or callee-saved. If the function binky calls winky, we refer to binky as the caller and winky as the callee. For example, the registers used for the first 6 arguments and return value are all caller-saved. The callee can freely use those registers, overwriting existing values without taking any precautions. If %rax holds a value the caller wants to retain, the caller must copy the value to a "safe" location before making a call. The caller-saved registers are ideal for scratch/temporary use by the callee. In contrast, if the callee intends to use a callee-saved register, it must first preserve its value and restore it before exiting the call. The callee-saved registers are used for local state of the caller that needs to preserved across further function calls.

True to its CISC nature, x86-64 supports a variety of addressing modes. An addressing mode is an expression that calculates an address in memory to be read/written to. These expressions are used as the source or destination for a mov instruction and other instructions that access memory. The code below demonstrates how to write the immediate value 0 to various memory locations in an example of each of the available addressing modes:

A note about instruction suffixes: many instructions have a suffix (b, w, l, or q) which indicates the bitwidth of the operation (1, 2, 4, or 8 bytes, respectively). The suffix is often elided when the bitwidth can be determined from the operands. For example, if the destination register is %eax, it must be 4 bytes, if %ax it must be 2 bytes, and %al would be 1 byte. A few instructions such as movs and movz have two suffixes: the first is for the source operand, the second for the destination. For example, movzbl moves a 1-byte source value to a 4-byte destination.

Mov and lea

By far most frequent instruction you'll encounter is mov in one of its its multi-faceted variants. Mov copies a value from source to destination. The source can be an immediate value, a register, or a memory location (expressed using one of the addressing mode expressions from above). The destination is either a register or a memory location. At most one of source or destination can be memory. The mov suffix (b, w, l, or q) indicates how many bytes are being copied (1, 2, 4, or 8 respectively). For the lea (load effective address) instruction, the source operand is a memory location (using an addressing mode from above) and it copies the calculated source address to destination. Note that lea does not dereference the source address, it simply calculates its location. This means lea is nothing more than an arithmetic operation and commonly used to calculate the value of simple linear combinations that have nothing to do with memory locations!

The mov instruction copies the same number of bytes from one location to another. In situations where the move is copying a smaller bitwidth to a larger, the movs and movz variants are used to specify how to fill the additional bytes, either sign-extend or zero-fill.

A special case to note is that a mov to write a 32-bit value into a register also zeroes the upper 32 bits of the register by default, i.e does an implicit zero-extend to bitwidth q. This explains use of instructions such as mov %ebx, %ebx that look odd/redundant, but are, in fact, being used to zero-extend from 32 to 64. Given this default behavior, there is no need for an explicit movzlq instruction. To instead sign-extend from 32-bit to 64-bit, there is an movslq instruction.

The cltq instruction is a specialized movs that operates on %rax. This no-operand instruction does sign-extension in-place on %rax; source bitwidth is l, destination bitwidth is q.

Arithmetic and bitwise operations

The binary operations are generally expressed in a two-operand form where the second operand is both a source to the operation and the destination. The source can be an immediate constant, register, or memory location. The destination must be either register or memory. At most one of source or destination can be memory. The unary operations have one operand which is both source and destination, which can be either register or memory. Many of the arithmetic instructions are used for both signed and unsigned types, i.e. there is not a signed add and unsigned add, the same instruction is used for both. Where needed, the condition codes set by the operation can be used to detect the different kinds of overflow.

Branching instructions

The special %flags register stores a set of boolean flags called the condition codes. Most arithmetic operations update those codes. A conditional jump reads the condition codes to determine whether to take the branch or not. The condition codes include ZF (zero flag), SF (sign flag), OF (overflow flag, signed), and CF (carry flag, unsigned). For example, if the result was zero, the ZF is set, if a operation overflowed (into sign bit), OF is set.

The general pattern for all branches is to execute a cmp or test operation to set the flags followed by a jump instruction variant that reads the flags to determine whether to take the branch or continue on. The operands to a cmp or test are immediate, register, or memory location (with at most one memory operand). There are 32 variants of conditional jump, several of which are synonyms as noted below.

Function call stack

The %rsp register is used as the "stack pointer"; push and pop are used to add/remove values from the stack. The push instruction takes one operand: an immediate, a register, or a memory location. Push decrements %rsp and copies the operand to be tompost on the stack. The pop instruction takes one operand, the destination register. Pop copies the topmost value to destination and increments %rsp. It is also valid to directly adjust %rsp to add/remove an entire array or a collection of variables with a single operation. Note the stack grows downward (toward lower addresses).

Call/return are using to transfer control between functions. The callq instruction takes one operand, the address of the function being called. It pushes the return address (current value of %rip) onto the stack and then jumps to that address. The retq instruction pops the return address from the stack into the destination %rip, thus resuming at the saved return address.

To set up for a call, the caller puts the first six arguments into registers %rdi, %rsi, %rdx, %rcx, %r8, and%r9 (any additional arguments are pushed onto the stack) and then executes the call instruction.

When callee finishes, it writes the return value (if any) to %rax, cleans up the stack, and use retq instruction to return control to the caller.

The target for a branch or call instruction is most typically an absolute address that was determined at compile-time. However there are cases where the target is not known until runtime, such as a switch statement compiled into a jump table or when invoking a function pointer. For these, the target address is computed and stored in a register and the branch/call variant is used je *%rax or callq *%rax to read the target address from the specified register.

The debugger has many features that allow you to trace and debug code at the assembly level. You can print the value in a register by prefixing its name with $ or use the command info reg to dump the values of all registers:

The disassemble command will print the disassembly for a function by name. The x command supports an i format which interprets the contents of a memory address as an encoded instruction.

[0]
Edit
Query
Report
Ritesh Abrez
BOXING INSPECTOR