A better explanation of the 32-bit x86 calling convention --------------------------------------------------------- In gcc's cdecl, the stack looks like this after a function call: BP?--> . . arg n . . arg 1 arg 0 SP--> return address The standard function prologue saves EBP, other registers as needed, then creates space for as many local variables and function-arguments as it needs. If no saved registers are needed, the stack looks like: . . arg n . . arg 0 return address BP--> saved BP var 0 var 1 . . var n subarg n . . SP--> subarg 0 Functions returning 32-bit results (int,long,pointer) do so in EAX, so that register is obviously not preserved across a function-call. (short) and (char) are returned in AX and AL, of course. (long long) is returned int EDX:EAX. CX is also used as a scratch register. On the other hand, BX, SI and DI are supposed to be saved across function-calls. If any are needed, the compiler chooses them in that order and pushes them in reverse-order during the prologue. At the end of the function (or as soon as the stack is not needed, under optimization), the compiler resets the stack by moving EBP to ESP and popping the saved EBP into EBP. All the "ret" instruction does is pop the return address into IP. So far, the function epilogue could contain a constant addition rather than a move instruction. That would even save one clock cycle on a 486; however, its implications on a superscalar chip are uncertain. It does make it easier to "walk up" the stack in a debugger. Decoupling SP/BP allows implementing the "alloca" function. All the compiler needs to do is decrease SP by the requested size, then return a pointer to the bottom of the now-untouched space (which is still above SP, if a function will be called). A similar operation can be used to allocate a local array whose size is known only at runtime; gcc supports this construct, but actually generates more code for it, even with '-O2' optimization. Float and double arguments are passed on the stack. A float or double return-value is returned in the first floating-point register. To recap, as real example, before the call to "printf" but after "fgets" in the little program "fgd.c": 0xBF89F420: 0x00000000 0x442d0cc0 0xbf89f498 0x0804845b ........ ........ .(old BP). .(ret addr). 0xbf89f430: 0xbf89f440 0x00000050 0x0804a008 0x00000000 arg0=buf arg1=80 arg2=f (padding) "#inc" "lude" " ",LF,NUL 0xbf89f450: 0x000a3e68 0x444065f4 0xbf89f468 0x0804838e zzssssss ........ ........ ........ . . [64 more bytes of unknown data] . 0xbf89f490: 0x00000000 0x44407ff4 0xbf89f4e8 0x442e8eb0 (padding) (old BX) (old BP) (ret addr) 0xbf89f4a0: 0x00000001 0xbf89f514 0xbf89f51c 0x00000001 (argc) (argv) (env) ???????? While this particular example was compiled using gcc and executed under gdb, the calling-convention I have described is the Intel standard for all i386 and later chips. It is based on the calling convention of the 8086; the primary difference is that the stack word size is 32 bits, and some operations are faster when done on 8-byte or even 16-byte boundaries, hence the (optional) padding. -- David Lee Lambert 12Oct'06