Monday, March 30, 2009

jmp'ing around Win64 with ml64.exe and Assembly Language

I've been fascinated by (and interested in) assembly language for a number of years. I fondly remember my "Peter Norton Assembly" book and wish I hadn't sent it to a better home several years ago. But, hey, it was a major moving of house and I had to have as few items as possible. The sacrifices we make…

Anyway, I had some spare time this past weekend and decided to try and see if I could get a simple 64-bit assembly language program working under Win64. My goal was to write some text to a console window and exit without destroying my entire system. I managed to succeed in that endeavor and had a bit of fun along the way. My wife thinks I have a bizarre definition of fun. If you continue to read the rest of this post keep in mind I said I was "interested" in assembly. By no means is this intended to be a tutorial or guide in writing efficient and professional assembly! As the comments in the code indicate, there was no true attempt at optimization, etc!

A couple of links that I ran across and found helpful (Links accurate as of today. Who knows what the future may bring?):

- Matt Pietrek's X64 Primer column from the May 2006 MSDN Magazine found here.

- Raymond Chen's Blog (The Old New Thing) posting entitled "The History of Calling Conventions, Part 5: AMD64" found here.

In order to make the code just a tiny bit more interesting I decided to call the "cpuid" function to get the Vendor ID from the processor and write that out as well as some static text. Information on the "cpuid" function is available in pdf format on both the Intel and AMD websites. One thing that I found interesting is that the 4 byte chunks (of which there are 3) that make up the output of the Vendor ID request from the "cpuid" function are stored in the registers in an "out of order" fashion. Look at the code below that gets the chunks from the registers and you'll see what I mean (I hope).

My First-Ever Windows 64-bit Assembly Language Program With Comments and Everything

NOTE: Multiple edits made per comment(s) etc. over time

; a (very) simple x64 masm (ml64.exe) demo that calls the cpuid function
; and writes the results to a console window.

; this is a simple experiment and almost certainly not very good assembly code
; for one thing there is no attempt at true optimization (like removing duplicate
; sections of code) or error handling.

; args are passed in registers as follows:
; RCX: 1st integer argument
; RDX: 2nd integer argument
; R8: 3rd integer argument
; R9: 4th integer argument
; the stack is used for args beyond 4 (after space reserved for first 4)
; this reserved space on the stack for first 4 arguments is
; the "register home space" or "register parameter stack area"

; unlike win32 the caller is responsible for cleaning the stack rather than
; the callee

; win api functions that are called below.
; this is a small optimization in that if
; call by name such as GetStdHandle there
; will be a stub generated that then jumps
; to the proper location. this avoids that
; extra jump from the stub location.
externdef __imp_GetStdHandle : qword
externdef __imp_ExitProcess  : qword
externdef __imp_WriteFile    : qword
externdef __imp_lstrlen      : qword

; the variables used in the main code below such as the text to write
; typically carriage return (13) and line feed (10) CRLF would not be
; embedded in the string values but this is a simple test / demo
.data
hFile        qword 0
msglen       dword 0
BytesWritten dword 0
vndrid       byte  13, 10, "The processor Vendor ID is 'xxxxxxxxxxxx'.", 13, 10, 0

; these are read-only
.const
supp         byte  13, 10, 'CPUID instruction supported!', 13, 10, 0
nosupp       byte  13, 10, 'CPUID instruction NOT supported!', 13, 10, 0

; and we're off...

.code
main proc frame
  ; registers that should be preserved (non-volatile)
  ; rbp, rbx, rdi, rsi, rsp, r12, r13, r14, r15

  ; prologue (there are other options)
  ; frame pointer in rbp is used because the code to check
  ; if cpuid is available changes rsp (via pushes and pops).
  ; stack modification outside of prologue requires frame pointer.
  ; the frame pointer is not directly used in this code.
  ; refer to "x64 exception handling" in "x64 ABI conventions".
  push rbx                       ; rbx is a non-volatile register used below, stack aligned
  .pushreg rbx                   ; unwind info
  push rbp                       ; pushing rbp mis-aligns stack again
  .pushreg rbp                   ; unwind info
  sub rsp, 28h                   ; adjust stack for register home space and parameter(s)
                                 ; WriteFile takes 5 parameters and thus uses stack for 5th
                                 ; also align on a 16 byte boundary
  .allocstack 28h                ; unwind info stack allocation
  mov rbp, rsp                   ; save current rsp into rbp as frame pointer
  .setframe rbp, 0               ; frame pointer (offset 0 - points to base not *into* frame
                                 ; which would be more typical). see above notes about usage.
  .endprolog                     ; unwind info mark end of prologue

  ; get the handle for the console window to use when writing text
  ; this is used by WriteFile win api function below
  mov ecx, -11                   ; put -11 (STD_OUTPUT_HANDLE) into ecx (dword)
  call __imp_GetStdHandle
  mov hFile, rax                 ; move returned handle to hFile

  ; check that the cpuid instruction is supported
  ; this is very likely supported, but here for completeness
  ; taken from the AMD64 Architecture Programmer's Manual
  ; Volume 3: General-Purpose and System Instructions
  pushfq                         ; save FLAGS
  pop rax                        ; store FLAGS in RAX
  mov rbx, rax                   ; save in RBX for test
  xor rax, 200000h               ; toggle bit #21 (i.e. 22nd bit)
  push rax                       ; put result on stack
  popfq                          ; save changed RAX to FLAGS
  pushfq                         ; push FLAGS onto stack
  pop rax                        ; store FLAGS in RAX
  cmp rax, rbx                   ; determine if bit #21 changed
  je no_cpuid                    ; cpuid not supported (bit did not change)

  ; The Windows 64-bit paradigm includes allocating the largest amount
  ; of stack space needed and then reusing that space. As such the
  ; stack pointer (rsp) does not change typically due to the reduced
  ; usage of pushes (and matched pops). There are no operations from
  ; this point that change rsp so it is used for providing offsets.
  ; If rsp did change (as it would in the above section of code) then
  ; using rbp (or other register as a frame pointer) would be better.
  ; And in such a case the register used as a frame pointer would point
  ; into the frame rather than at the base (as noted above).

  ; cpuid is supported if we did not jump...
  ; so print the message that it is supported and then
  ; jump to location where it is actually performed
  lea rcx, supp                  ; load address of supp text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, supp                  ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

  jmp do_cpuid                   ; jump over the no_cpuid code path

no_cpuid:
  lea rcx, nosupp                ; load address of nosupp text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, nosupp                ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

  jmp exit                       ; the no support message was written, so exit

do_cpuid:
  lea rcx, vndrid                ; load address of vndrid text into rcx
  call __imp_lstrlen             ; call win api function to get length of text
  mov msglen, eax                ; move length of string from eax to msglen

  xor eax, eax                   ; 0 indicates we want the vendor id string
  cpuid                          ; invoke the cpuid instruction to get the string
                                 ; the results are now in ebx, edx, and ecx

  ; replace the 'x' placeholders with the vendor id
  lea r9, vndrid

  mov [r9+30], ebx               ; the first 4 bytes of the vendor id
  mov [r9+34], edx               ; the next 4 bytes of the vendor id
  mov [r9+38], ecx               ; the final 4 bytes of the vendor id

  mov qword ptr [rsp + 32], 0    ; LPOVERLAPPED lpOverlapped
  lea r9, BytesWritten           ; LPDWORD lpNumberOfBytesWritten
  mov r8d, msglen                ; DWORD nNumberOfBytesToWrite
  lea rdx, vndrid                ; LPCVOID lpBuffer
  mov rcx, hFile                 ; HANDLE hFile
  call __imp_WriteFile           ; call win api to write text to console

exit:
  xor ecx, ecx                   ; return value of 0 passed as UINT parameter to ExitProcess
  call __imp_ExitProcess         ; exit in Windows-friendly manner rather than typical epilogue
                                 ; such as would be present outside of this entry point procedure
                                 ; nothing after here would execute so no epilogue etc.
main endp

end

The command I used to compile the above is (apologies if wrapping make it difficult to read):

ml64 test64.asm /link /subsystem:console /defaultlib:"C:\Program Files\Microsoft SDKs\Windows\v6.0A\Lib\x64\kernel32.lib" /entry:main

And for the moment of truth…

C:\My Projects\Test\ASM\test>test64

CPUID instruction supported!

The processor Vendor ID is 'GenuineIntel'.

Hurray! Unless you sell AMD chips then you might think 'AuthenticAMD' would look better…

If you see something profoundly wrong in the above, please let me know. It's not like I am going to be churning out assembly code all of a sudden, but I would still be interested in knowing…

Tuesday, March 24, 2009

ODP.NET Tip: Bind Variables, the BindByName Property, and PL/SQL Function Return Values

A question was posed as a follow-up to a previous post here:

http://oradim.blogspot.com/2007/08/odpnet-tip-bind-variables-and.html

The poster wanted to know if you have set BindByName to "true" for an OracleCommand object and the command text for that object calls a PL/SQL function (which of course has a return value but no name), what name should you give to the parameter object that will receive the return value? It's a good question since BindByName set to "true" implies that you've given names to your bind variables!

The short answer is: It doesn't matter as it will be ignored anyway.

As in The Highlander movies, with return values, there can be only one. You can set ParameterDirection to "ReturnValue" for multiple parameters, but it is an error to do so. Try it and see what errors are raised.

Here's a quick little demo showing that it does not matter what name you give to the parameter that you declare as the return value.

First create a simple PL/SQL function (so that there is a return value!) like this:

create or replace function subtract_numbers(p1 in number, p2 in number) return number as
begin
  return p1 - p2;
end;
/

I've elected to do a simple bit of subtraction and this is for a reason. Since in subtraction the order of the parameters matters, it is easy to show that the parameters are being handled in the correct order. As you'll see in the sample code below, I use the values "48" and "2" so it would be obvious if the ordering were reversed.

Now, here's some simple (and normal caveats about no error handling, etc. apply) bit of C# to illustrate the point:

using System;
using System.Data;
using Oracle.DataAccess.Client;
using Oracle.DataAccess.Types;

class Program
{
  static void Main(string[] args)
  {
    // adjust connect string to suit your environment
    string constr = "user id=hr;" +
                    "password=hr;" +
                    "data source=orademo;" +
                    "pooling=false;" +
                    "enlist=false";

    // create and open connection
    OracleConnection con = new OracleConnection(constr);
    con.Open();

    // create command object
    OracleCommand cmd = con.CreateCommand();
    cmd.CommandText = "subtract_numbers";
    cmd.CommandType = CommandType.StoredProcedure;

    // set bindbyname to true so parameters can be added
    // in any order and to illustrate what happens with a
    // return value
    cmd.BindByName = true;

    // the first input parameter to the pl/sql function
    OracleParameter p1 = new OracleParameter();
    p1.Direction = ParameterDirection.Input;
    p1.OracleDbType = OracleDbType.Int64;
    p1.ParameterName = "p1";
    p1.Value = 48;

    // the second input parameter to the pl/sql function
    OracleParameter p2 = new OracleParameter();
    p2.Direction = ParameterDirection.Input;
    p2.OracleDbType = OracleDbType.Int64;
    p2.ParameterName = "p2";
    p2.Value = 2;

    // the return parameter from the pl/sql function
    OracleParameter p3 = new OracleParameter();
    p3.Direction = ParameterDirection.ReturnValue;
    p3.OracleDbType = OracleDbType.Int64;
    p3.ParameterName = "ignored";

    // add the parameters in a "wrong" order since
    // bindbyname is true -- this is key
    cmd.Parameters.Add(p1);  // input #1
    cmd.Parameters.Add(p3);  // return value
    cmd.Parameters.Add(p2);  // input #2

    // execute the stored pl/sql code
    cmd.ExecuteNonQuery();

    // write the result to the console window
    Console.WriteLine("The difference of {0} and {1} is {2}",
      p1.Value, p2.Value, p3.Value);

    // prevent console window from closing when run from VS
    Console.WriteLine();
    Console.Write("ENTER to continue...");
    Console.ReadLine();
    // clean up
    p3.Dispose();
    p2.Dispose();
    p1.Dispose();
    cmd.Dispose();
    con.Dispose();
  }
}

Notice how the parameters are added to the parameter collection "out of order". This is OK since BindByName is set to "true". You can comment out the line that sets BindByName and see an exception raised if you want. Anyway, when I execute this code, I get the expected results as follows:

The difference of 48 and 2 is 46

ENTER to continue...

So, even though the return value parameter was added to the collection in the second position everything works as desired. Yes, but I gave it a name of "ignored" so how do we know that name is not really being used? Well there are two easy ways to find out. One, just don't give the parameter a name at all and everything should continue to execute just fine. It's easy to ignore a name that's not there! Or, two, if you are an inquisitive type, enable SQL tracing (search the web for 10046 trace -- you'll find a million hits) and see what ODP.NET actually sent to the database. Here's a snippet of a trace file from the above code (with the parameter name of "ignored" in place):

PARSING IN CURSOR #2 len=54 dep=0 uid=82 oct=47 lid=82 tim=10601980885 hv=1771638930 ad='7ff39775518' sqlid='gx0kvypntk44k'
Begin :ret := subtract_numbers(p1=>:v0, p2=>:v2); End;
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=10601980885
BINDS #2:
Bind#0
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=72 off=0
  kxsbbbfp=1bb2dd60  bln=22  avl=00  flg=05
Bind#1
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=1bb2dd78  bln=22  avl=02  flg=01
  value=48
Bind#2
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=48
  kxsbbbfp=1bb2dd90  bln=22  avl=02  flg=01
  value=2

See how ODP.NET generated a little anonymous block and substituted "ret" for the return value parameter? You should see the same thing even if there is no name given to the return parameter. You can also clearly see how the input parameter names are associated with the input values.

Saturday, March 14, 2009

HotSos Symposium 2009 Materials

The paper and all sample code from my HotSos Symposium 2009 presentation entitled "Instrumented Code Is Better Code" is available as a single .zip file. To download the .zip file, click here.