Monday, March 30, 2009

jmp'ing around Win64 with ml64.exe and Assembly Language

I've been fascinated by (and interested in) assembly language for a number of years. I fondly remember my "Peter Norton Assembly" book and wish I hadn't sent it to a better home several years ago. But, hey, it was a major moving of house and I had to have as few items as possible. The sacrifices we make…

Anyway, I had some spare time this past weekend and decided to try and see if I could get a simple 64-bit assembly language program working under Win64. My goal was to write some text to a console window and exit without destroying my entire system. I managed to succeed in that endeavor and had a bit of fun along the way. My wife thinks I have a bizarre definition of fun. If you continue to read the rest of this post keep in mind I said I was "interested" in assembly. By no means is this intended to be a tutorial or guide in writing efficient and professional assembly! As the comments in the code indicate, there was no true attempt at optimization, etc!

A couple of links that I ran across and found helpful (Links accurate as of today. Who knows what the future may bring?):

- Matt Pietrek's X64 Primer column from the May 2006 MSDN Magazine found here.

- Raymond Chen's Blog (The Old New Thing) posting entitled "The History of Calling Conventions, Part 5: AMD64" found here.

In order to make the code just a tiny bit more interesting I decided to call the "cpuid" function to get the Vendor ID from the processor and write that out as well as some static text. Information on the "cpuid" function is available in pdf format on both the Intel and AMD websites. One thing that I found interesting is that the 4 byte chunks (of which there are 3) that make up the output of the Vendor ID request from the "cpuid" function are stored in the registers in an "out of order" fashion. Look at the code below that gets the chunks from the registers and you'll see what I mean (I hope).

My First-Ever Windows 64-bit Assembly Language Program With Comments and Everything

NOTE: Multiple edits made per comment(s) etc. over time

; a (very) simple x64 masm (ml64.exe) demo that calls the cpuid function
; and writes the results to a console window.

; this is a simple experiment and almost certainly not very good assembly code
; for one thing there is no attempt at true optimization (like removing duplicate
; sections of code) or error handling.

; args are passed in registers as follows:
; RCX: 1st integer argument
; RDX: 2nd integer argument
; R8: 3rd integer argument
; R9: 4th integer argument
; the stack is used for args beyond 4 (after space reserved for first 4)
; this reserved space on the stack for first 4 arguments is
; the "register home space" or "register parameter stack area"

; unlike win32 the caller is responsible for cleaning the stack rather than
; the callee

; win api functions that are called below.
; this is a small optimization in that if
; call by name such as GetStdHandle there
; will be a stub generated that then jumps
; to the proper location. this avoids that
; extra jump from the stub location.
externdef __imp_GetStdHandle : qword
externdef __imp_ExitProcess  : qword
externdef __imp_WriteFile    : qword
externdef __imp_lstrlen      : qword

; the variables used in the main code below such as the text to write
; typically carriage return (13) and line feed (10) CRLF would not be
; embedded in the string values but this is a simple test / demo
.data
hFile        qword 0
msglen       dword 0
BytesWritten dword 0
vndrid       byte  13, 10, "The processor Vendor ID is 'xxxxxxxxxxxx'.", 13, 10, 0

; these are read-only
.const
supp         byte  13, 10, 'CPUID instruction supported!', 13, 10, 0
nosupp       byte  13, 10, 'CPUID instruction NOT supported!', 13, 10, 0

; and we're off...

.code
main proc frame
  ; registers that should be preserved (non-volatile)
  ; rbp, rbx, rdi, rsi, rsp, r12, r13, r14, r15

  ; prologue (there are other options)
  ; frame pointer in rbp is used because the code to check
  ; if cpuid is available changes rsp (via pushes and pops).
  ; stack modification outside of prologue requires frame pointer.
  ; the frame pointer is not directly used in this code.
  ; refer to "x64 exception handling" in "x64 ABI conventions".
  push rbx                       ; rbx is a non-volatile register used below, stack aligned
  .pushreg rbx                   ; unwind info
  push rbp                       ; pushing rbp mis-aligns stack again
  .pushreg rbp                   ; unwind info
  sub rsp, 28h                   ; adjust stack for register home space and parameter(s)
                                 ; WriteFile takes 5 parameters and thus uses stack for 5th
                                 ; also align on a 16 byte boundary
  .allocstack 28h                ; unwind info stack allocation
  mov rbp, rsp                   ; save current rsp into rbp as frame pointer
  .setframe rbp, 0               ; frame pointer (offset 0 - points to base not *into* frame
                                 ; which would be more typical). see above notes about usage.
  .endprolog                     ; unwind info mark end of prologue

  ; get the handle for the console window to use when writing text
  ; this is used by WriteFile win api function below
  mov ecx, -11                   ; put -11 (STD_OUTPUT_HANDLE) into ecx (dword)
  call __imp_GetStdHandle
  mov hFile, rax                 ; move returned handle to hFile

  ; check that the cpuid instruction is supported
  ; this is very likely supported, but here for completeness
  ; taken from the AMD64 Architecture Programmer's Manual
  ; Volume 3: General-Purpose and System Instructions
  pushfq                         ; save FLAGS
  pop rax                        ; store FLAGS in RAX
  mov rbx, rax                   ; save in RBX for test
  xor rax, 200000h               ; toggle bit #21 (i.e. 22nd bit)
  push rax                       ; put result on stack
  popfq                          ; save changed RAX to FLAGS
  pushfq                         ; push FLAGS onto stack
  pop rax                        ; store FLAGS in RAX
  cmp rax, rbx                   ; determine if bit #21 changed
  je no_cpuid                    ; cpuid not supported (bit did not change)

  ; The Windows 64-bit paradigm includes allocating the largest amount
  ; of stack space needed and then reusing that space. As such the
  ; stack pointer (rsp) does not change typically due to the reduced
  ; usage of pushes (and matched pops). There are no operations from
  ; this point that change rsp so it is used for providing offsets.
  ; If rsp did change (as it would in the above section of code) then
  ; using rbp (or other register as a frame pointer) would be better.
  ; And in such a case the register used as a frame pointer would point
  ; into the frame rather than at the base (as noted above).

  ; cpuid is supported if we did not jump...
  ; so print the message that it is supported and then
  ; jump to location where it is actually performed
  lea rcx, supp                  ; load address of supp text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, supp                  ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

  jmp do_cpuid                   ; jump over the no_cpuid code path

no_cpuid:
  lea rcx, nosupp                ; load address of nosupp text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, nosupp                ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

  jmp exit                       ; the no support message was written, so exit

do_cpuid:
  lea rcx, vndrid                ; load address of vndrid text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  xor eax, eax                   ; 0 indicates we want the vendor id string
  cpuid                          ; invoke the cpuid instruction to get the string
                                 ; the results are now in ebx, edx, and ecx

  ; replace the 'x' placeholders with the vendor id
  lea r9, vndrid

  mov [r9+30], ebx               ; the first 4 bytes of the vendor id
  mov [r9+34], edx               ; the next 4 bytes of the vendor id
  mov [r9+38], ecx               ; the final 4 bytes of the vendor id

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, vndrid                ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

exit:
  xor ecx, ecx                   ; return value of 0 passed as UINT parameter to ExitProcess
  call __imp_ExitProcess         ; exit in Windows-friendly manner rather than typical epilogue
                                 ; such as would be present outside of this entry point procedure
                                 ; nothing after here would execute so no epilogue etc.
main endp

end

The command I used to compile the above is (apologies if wrapping make it difficult to read):

ml64 test64.asm /link /subsystem:console /defaultlib:"C:\Program Files\Microsoft SDKs\Windows\v6.0A\Lib\x64\kernel32.lib" /entry:main

And for the moment of truth…

C:\My Projects\Test\ASM\test>test64

CPUID instruction supported!

The processor Vendor ID is 'GenuineIntel'.

Hurray! Unless you sell AMD chips then you might think 'AuthenticAMD' would look better…

If you see something profoundly wrong in the above, please let me know. It's not like I am going to be churning out assembly code all of a sudden, but I would still be interested in knowing…