These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.
In a previous blog I talked about stack frames and presented what I consider a "Traditional" stack frame layout.
The stack frame entry code I presented looked like:
stp fp, lr, [sp,#-16]!
mov fp, sp
sub sp, sp, #160
The link register (AKA x30
) and the frame pointer (AKA x29
) is pushed on the stack, the modified stack pointer
is stored in the frame pointer and then space is made on the stack for any local and temporary variables.
And the postamble looked like this:
mov sp, fp
ldp fp, lr, [sp], #16
ret
I noticed that the stack frame pre- and post-amble generated by GCC, Clang and MSVC didn't look like this.
I therefore wrote the following short program and inspected the generated assembly using Compiler Explorer.
#include <string>
#include <iostream>
std::string merge( std::string a, std::string b, std::string c )
{
std::string d = a + b;
std::string e = a + d + b;
return d + e;
}
int main()
{
merge( "a", "b", "c" );
}
The motivation here is to create a function that requires more data than can fit in the processor's registers and hence has to allocate stack space.
The stack pre-amble generated by armv8-a Clang (Available at: https://godbolt.org/z/3hYzxYG1r) looked as follows:
merge(std::__cxx11::basic_string<char, std::char_traits<char>, ...
sub sp, sp, #176
stp x29, x30, [sp, #160] // 16-byte Folded Spill
add x29, sp, #160
Here the stack is grown first (towards lower memory) and then the stp
reaches
back to the top of the allocated region to insert the frame pointer (fp
/x29
)
and link register (x30
/lr
). The location of where the frame pointer and link register
was stored was computed and then stored in the new frame pointer.
The resulting stack ends up similar to my "traditional" layout but computed in a different way. It looks like this:
| |
+---------------------+
| lr |
+---------------------+
| original fp | <- fp
+---------------------+
| |
| |
| ...space... |
| |
| | <- sp
+---------------------+
The Clang stack post-amble in this case is:
ldp x29, x30, [sp, #160] // 16-byte Folded Reload
add sp, sp, #176
ret
The code reaches back to retrieve the frame pointer and link register and then computes what the stack pointer would have been before the function was entered. Note that it doesn't use the frame pointer to do this.
With GCC the following pre-amble is used (available at: https://godbolt.org/z/PMKWzP93Y):
merge(std::__cxx11::basic_string<char, std::char_traits<char>,...:
stp x29, x30, [sp, -160]!
mov x29, sp
Here the frame pointer and link register end up stored at the bottom of the allocated stack space. That seems unusual to me!
| |
+---------------------+
| |
| |
| ...space... |
| |
| |
+---------------------+
| lr |
+---------------------+
| original fp | <- fp, sp
+---------------------+
The post-amble is below. Note again, the frame pointer is not used and it relies on the compiler keeping track of how many words it has allocated on the stack (easy for a compiler to do but not so easy for a person).
ldp x29, x30, [sp], 160
ret
The GCC approach does avoid requiring an additional sub sp, sp, ?
instruction so it makes sense
in that respect - if you can easily keep track of how much space you've allocated on the stack.
MS Visual Studio does the following. It seems to use some security cookies to protect the stack
from (presumably) ROP attacks. It computes space and pushes the frame pointer and link register
on the stack and then reaches back to store x19
. It stores the revised stack pointer in the frame
pointer and the allocates stack space for the local data.
MSVC (https://godbolt.org/z/4q3EoTKhj)
... merge(std::basic_string<char,std::char_traits<char>...
stp fp,lr,[sp,#-0x20]!
str x19,[sp,#0x10]
mov fp,sp
bl __security_push_cookie
sub sp,sp,#0xA0
mov x19,sp
The stack ends up looking like this:
| |
+---------------------+
| (unused) |
+---------------------+
| x19 |
+---------------------+
| lr |
+---------------------+
| original fp | <- fp
+---------------------+
| |
| |
| ...space... |
| |
| | <- sp
+---------------------+
If we modify the MSVC code to remove the impact of the x19
based security cookie then the code looks like the
traditional pre-amble:
... modified merge(std::basic_string<char,std::char_traits<char>...
stp fp,lr,[sp,#-0x10]!
mov fp,sp
sub sp,sp,#0xA0
The MSVC post-amble is:
ldr x0,[x19,#8]
add sp,sp,#0xA0
bl __security_pop_cookie
ldr x19,[sp,#0x10]
ldp fp,lr,[sp],#0x20
ret
Again, the security cookie changes things, but as with the other compilers, it is relying on keeping track of how much the stack pointer has been changed rather than relying on the frame pointer.
In summary, it's interesting how the different compilers solve the same problem. Persoanally I would use my "traditional" stack frame for hand crafted code and use the frame pointer in the post-amble unless the slight added efficiency of the GCC approach was demonstrably beneficial.
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
November 2021
June 2021
May 2021
April 2021
March 2021
October 2020
September 2020
September 2019
March 2019
June 2018
June 2017
August 2016