I've been fascinated by (and interested in) assembly language for a number of years. I fondly remember my "Peter Norton Assembly" book and wish I hadn't sent it to a better home several years ago. But, hey, it was a major moving of house and I had to have as few items as possible. The sacrifices we make…
Anyway, I had some spare time this past weekend and decided to try and see if I could get a simple 64-bit assembly language program working under Win64. My goal was to write some text to a console window and exit without destroying my entire system. I managed to succeed in that endeavor and had a bit of fun along the way. My wife thinks I have a bizarre definition of fun. If you continue to read the rest of this post keep in mind I said I was "interested" in assembly. By no means is this intended to be a tutorial or guide in writing efficient and professional assembly! As the comments in the code indicate, there was no true attempt at optimization, etc!
A couple of links that I ran across and found helpful (Links accurate as of today. Who knows what the future may bring?):
- Matt Pietrek's X64 Primer column from the May 2006 MSDN Magazine found here.
- Raymond Chen's Blog (The Old New Thing) posting entitled "The History of Calling Conventions, Part 5: AMD64" found here.
In order to make the code just a tiny bit more interesting I decided to call the "cpuid" function to get the Vendor ID from the processor and write that out as well as some static text. Information on the "cpuid" function is available in pdf format on both the Intel and AMD websites. One thing that I found interesting is that the 4 byte chunks (of which there are 3) that make up the output of the Vendor ID request from the "cpuid" function are stored in the registers in an "out of order" fashion. Look at the code below that gets the chunks from the registers and you'll see what I mean (I hope).
My First-Ever Windows 64-bit Assembly Language Program With Comments and Everything
NOTE: Multiple edits made per comment(s) etc. over time
; a (very) simple x64 masm (ml64.exe) demo that calls the cpuid function
; and writes the results to a console window.
; this is a simple experiment and almost certainly not very good assembly code
; for one thing there is no attempt at true optimization (like removing duplicate
; sections of code) or error handling.
; args are passed in registers as follows:
; RCX: 1st integer argument
; RDX: 2nd integer argument
; R8: 3rd integer argument
; R9: 4th integer argument
; the stack is used for args beyond 4 (after space reserved for first 4)
; this reserved space on the stack for first 4 arguments is
; the "register home space" or "register parameter stack area"
; unlike win32 the caller is responsible for cleaning the stack rather than
; the callee
; win api functions that are called below.
; this is a small optimization in that if
; call by name such as GetStdHandle there
; will be a stub generated that then jumps
; to the proper location. this avoids that
; extra jump from the stub location.
externdef __imp_GetStdHandle : qword
externdef __imp_ExitProcess : qword
externdef __imp_WriteFile : qword
externdef __imp_lstrlen : qword
; the variables used in the main code below such as the text to write
; typically carriage return (13) and line feed (10) CRLF would not be
; embedded in the string values but this is a simple test / demo
.data
hFile qword 0
msglen dword 0
BytesWritten dword 0
vndrid byte 13, 10, "The processor Vendor ID is 'xxxxxxxxxxxx'.", 13, 10, 0
; these are read-only
.const
supp byte 13, 10, 'CPUID instruction supported!', 13, 10, 0
nosupp byte 13, 10, 'CPUID instruction NOT supported!', 13, 10, 0
; and we're off...
.code
main proc frame
; registers that should be preserved (non-volatile)
; rbp, rbx, rdi, rsi, rsp, r12, r13, r14, r15
; prologue (there are other options)
; frame pointer in rbp is used because the code to check
; if cpuid is available changes rsp (via pushes and pops).
; stack modification outside of prologue requires frame pointer.
; the frame pointer is not directly used in this code.
; refer to "x64 exception handling" in "x64 ABI conventions".
push rbx ; rbx is a non-volatile register used below, stack aligned
.pushreg rbx ; unwind info
push rbp ; pushing rbp mis-aligns stack again
.pushreg rbp ; unwind info
sub rsp, 28h ; adjust stack for register home space and parameter(s)
; WriteFile takes 5 parameters and thus uses stack for 5th
; also align on a 16 byte boundary
.allocstack 28h ; unwind info stack allocation
mov rbp, rsp ; save current rsp into rbp as frame pointer
.setframe rbp, 0 ; frame pointer (offset 0 - points to base not *into* frame
; which would be more typical). see above notes about usage.
.endprolog ; unwind info mark end of prologue
; get the handle for the console window to use when writing text
; this is used by WriteFile win api function below
mov ecx, -11 ; put -11 (STD_OUTPUT_HANDLE) into ecx (dword)
call __imp_GetStdHandle
mov hFile, rax ; move returned handle to hFile
; check that the cpuid instruction is supported
; this is very likely supported, but here for completeness
; taken from the AMD64 Architecture Programmer's Manual
; Volume 3: General-Purpose and System Instructions
pushfq ; save FLAGS
pop rax ; store FLAGS in RAX
mov rbx, rax ; save in RBX for test
xor rax, 200000h ; toggle bit #21 (i.e. 22nd bit)
push rax ; put result on stack
popfq ; save changed RAX to FLAGS
pushfq ; push FLAGS onto stack
pop rax ; store FLAGS in RAX
cmp rax, rbx ; determine if bit #21 changed
je no_cpuid ; cpuid not supported (bit did not change)
; The Windows 64-bit paradigm includes allocating the largest amount
; of stack space needed and then reusing that space. As such the
; stack pointer (rsp) does not change typically due to the reduced
; usage of pushes (and matched pops). There are no operations from
; this point that change rsp so it is used for providing offsets.
; If rsp did change (as it would in the above section of code) then
; using rbp (or other register as a frame pointer) would be better.
; And in such a case the register used as a frame pointer would point
; into the frame rather than at the base (as noted above).
; cpuid is supported if we did not jump...
; so print the message that it is supported and then
; jump to location where it is actually performed
lea rcx, supp ; load address of supp text into rcx
call __imp_lstrlen ; call win api function to get length of text
mov msglen, eax ; move length of string from eax to msglen
mov qword ptr [rsp + 32], 0 ; LPOVERLAPPED lpOverlapped
lea r9, BytesWritten ; LPDWORD lpNumberOfBytesWritten
mov r8d, msglen ; DWORD nNumberOfBytesToWrite
lea rdx, supp ; LPCVOID lpBuffer
mov rcx, hFile ; HANDLE hFile
call __imp_WriteFile ; call win api to write text to console
jmp do_cpuid ; jump over the no_cpuid code path
no_cpuid:
lea rcx, nosupp ; load address of nosupp text into rcx
call __imp_lstrlen ; call win api function to get length of text
mov msglen, eax ; move length of string from eax to msglen
mov qword ptr [rsp + 32], 0 ; LPOVERLAPPED lpOverlapped
lea r9, BytesWritten ; LPDWORD lpNumberOfBytesWritten
mov r8d, msglen ; DWORD nNumberOfBytesToWrite
lea rdx, nosupp ; LPCVOID lpBuffer
mov rcx, hFile ; HANDLE hFile
call __imp_WriteFile ; call win api to write text to console
jmp exit ; the no support message was written, so exit
do_cpuid:
lea rcx, vndrid ; load address of vndrid text into rcx
call __imp_lstrlen ; call win api function to get length of text
mov msglen, eax ; move length of string from eax to msglen
xor eax, eax ; 0 indicates we want the vendor id string
cpuid ; invoke the cpuid instruction to get the string
; the results are now in ebx, edx, and ecx
; replace the 'x' placeholders with the vendor id
lea r9, vndrid
mov [r9+30], ebx ; the first 4 bytes of the vendor id
mov [r9+34], edx ; the next 4 bytes of the vendor id
mov [r9+38], ecx ; the final 4 bytes of the vendor id
mov qword ptr [rsp + 32], 0 ; LPOVERLAPPED lpOverlapped
lea r9, BytesWritten ; LPDWORD lpNumberOfBytesWritten
mov r8d, msglen ; DWORD nNumberOfBytesToWrite
lea rdx, vndrid ; LPCVOID lpBuffer
mov rcx, hFile ; HANDLE hFile
call __imp_WriteFile ; call win api to write text to console
exit:
xor ecx, ecx ; return value of 0 passed as UINT parameter to ExitProcess
call __imp_ExitProcess ; exit in Windows-friendly manner rather than typical epilogue
; such as would be present outside of this entry point procedure
; nothing after here would execute so no epilogue etc.
main endp
end
The command I used to compile the above is (apologies if wrapping make it difficult to read):
ml64 test64.asm /link /subsystem:console /defaultlib:"C:\Program Files\Microsoft SDKs\Windows\v6.0A\Lib\x64\kernel32.lib" /entry:main
And for the moment of truth…
C:\My Projects\Test\ASM\test>test64
CPUID instruction supported!
The processor Vendor ID is 'GenuineIntel'.
Hurray! Unless you sell AMD chips then you might think 'AuthenticAMD' would look better…
If you see something profoundly wrong in the above, please let me know. It's not like I am going to be churning out assembly code all of a sudden, but I would still be interested in knowing…