31 March 2009

ByRef parameters are Pointers

Yes, that is right. A ByRef parameter is effectively a Pointer [To]. Let's see what this means. A pointer variable of some data type behaves exactly as a variable instance of that type. A variable pdbl declared As Pointer To Double behaves exactly as a Double. Not only is it used in the same way as a normal instance of a Double, but anything GFA-BASIC 32 allows to be done with a Double data type is allowed with pdbl (a pointer to double). Once a pointer has been initialized (given a memory address) the compiler interpretates the pdbl pointer variable as a variable of type Double. For the programmer the pdbl is the same as a normal Double variable. The compiler on the other hand accepts pdbl as a Double on each occasion it processes it, but it generates different code to perform operations on pdbl. The problem is that the compiler isn't aware of the exact location of pdbl's data location. Variables that are instantiated using Global, Local, Static, and of course Dim, are included in the program (exe). They have a reserved portion of memory that the compiler know about. All code generated by the compiler can operate on this memory location directly. The assembler code generated for accessing 'normal' variables is therefore quite different than assembler code generated for pointers. For pointers GFA-BASIC 32 generates C-like pointer code; it stores an address (data location) in a register and uses this value to indirectly access the data location. When the GFA-BASIC 32 compiler parses the code, it marks all un-addressed variables before generating code and when it comes to create assembler instructions it generates pointer code, assuming the actual address will be known at runtime. There are only two type of variables the memory address isn't known from at compile time: Pointer To and ByRef. Investigation showed that both types are marked the same way when they are encountered during compiling. A ByRef procedure argument is in essence a Pointer To variable. Initializing a pointer The pointer pdbl can be initialized at runtime only. The pointer pdbl initially points to address 0 (null). To be used practically, the pointer variable must be assigned a memory address. The compiler uses this memory address to operate on. Normally, the memory address of GFA-BASIC 32 variables is 'handled' through the VarPtr function (or its shortcut V:). Essentially, a pointer lacks a 'VarPtr' address. (It could have been an idea when GFA-BASIC supported a 'VarPtr(var)=addr' command to set the memory address. However, in case of variables that use a descriptor (String, arrays) there would also have to be an 'ArrPtr()= addr' to set the memory address for such a pointer.) To initialise a pointer a new keyword was introduced: Pointer(pvar)=. (Note - This still requires the programmer to be aware what memory to set with Pointer(). A Pointer To String will require a pointer to the descriptor of another string!) I discussed this command multiple times and won't go into that here. The question is how is the apllicable on ByRef arguments? A ByRef variable is a Pointer To variable and there is only one way to set the VarPtr address of a variable: the Pointer()= command. This command is implicitly invoked when a variable is passed to a procedure taking a ByRef argument. To get access to the passed argument the compiler generates code apropriate for the data type, but the memory address of the actual data isn't known at compile time. The ByRef variable is a local variable on the stack without a VarPtr address. The compiler than generates pointer-operation-code to process this ByRef variable, exactly the same code as generated for a Pointer To variable. This is in contrast with a ByVal variable. Here the compiler generates code to operate on the variable's memory directly, because its memory address is known in advance. (A local variable is located relative to the stack entry point (esp) of a procedure. The stack pointer is copied to ebx which is used by the compiler to locate the local variables. A local variable is recognizable in assembler instructions like mov eax, 112[ebx]. ). So, for the compiler the ByVal and ByRef keywords have two different meanings. It tells the compiler how to pass the argument to the subroutine and determines the type of code to generate. This is VERY important to realize when you are dealing with pointers in general. In case of low-level programming you must realize what exactly you are receiving in your subroutine. You need to ask yourself whether you need pointer code or normal code to be generated for that variable. Remember the ListView custom sort soutine I presented earlier? Windows passes the lParam member of the LV_ITEM structure to the compare function, which holds a pointer to a ListItem COM object. Do you remember to function's prototype? Here its is.
Function CompareDates(ByVal lngParam1 As ListItem,
  ByVal lngParam2 As ListItem, ByVal iCol As Long) As Long
Why does it need to be declared as ByVal, rather than as ByRef? Because it isn't all that simple (understatement). Some data types force GFA-BASIC 32 to generate pointer code by default, these include user-defined-types (Type) and Ocx/COM objects. GFA-BASIC 32 generates pointer-to code for the ListItem COM object by default. When the parameter would be declared ByRef, the value passed by Windows would force GFA-BASIC to generate code that uses lngParam as a pointer to a Listitem pointer. Hence Excpetion Errors! Ok enough for now, but its not end!

Pointers are pointers indeed!

I discussed the Pointer [To] data type in an article published at the GFA=BASIC 32 site: http://gfabasic32.googlepages.com/thepointertodatatype. It is actually quite accurate ;-). However, I needed to re-investigate the Pointer data type, because I got some unexpected 'Unexpected Exceptions'. I had to extend the investigations by studying disassemblies and the Gfa_Var object. I also had to reset my knowledge entirely and start from scratch all over again. In the article I published I focussed on comparing a Pointer To to a normal data type. I tried to proof that a Pointer data type isn't so different from a normal variable. Although, I am correct in this article I missed an important aspect of pointers. I simply didn't realize that GFA-BASIC 32 handles a Pointer data type differently, and I didn't realize it because I didn't inspect the resulting code generated by the compiler. Actually, the first lines of the first article signal the problem: "Pointer is a data type to declare variables as pointers.", and then the syntax is given as: Dim p As [Register] Pointer [To] type p: pointer variable type: any data type (Int, String, Double, user-defined-type) This declaration is complete but not accurate enough, because a pointer to an Ocx object is allowed too. In fact, you may declare a pointer to anything that GFA-BASIC 32 considers a data type. And this might proof to be very important, because GFA-BASIC 32 considers something to be a data type when it is an element of the internal TypeInfo hash table containing data types. How these data types are added to this hash table, GFA doesn't care! For instance, when you load a $Library, the data types from the lg32 are appended to this hash table, and the same is true when GFA-BASIC 32 starts and initializes all primary data types (Int, Bool, String, Hash, Variant, etc, etc.), they are simply added to the hash table. (Since I'm trying to implement the Class data type, this information is essential. GFA-BASIC doesn't care about a type as long as it is in the 'TypeInfo' hash table. We'll see where this will go.) Ok, back to the problem. In the declaration it is explicitly stated that a pointer can be located in a register, see the optional Register keyword. This actually indicates a different treatment of a pointer variable! A variable of type Double (8-bytes) cannot be located in a register, because a register is only 32-bits (4-bytes) wide. It is beyond this post how GFA stores (local) variables relative to the ebx register and reserves edi and esi for Register variables. Maybe in a later post; it isn't relevant to the topic at hand. What is important, is that GFA indeed handles a pointer type different, it handles a Pointer variable indeed as 32-bits pointer, otherwise you couldn't store a pointer (to any type) in a 32-bits register, would you? Wow. How did I miss this? Well, I'm only human too, you know. All I did was investigate the information GFA-BASIC 32 returned from a (pointer) variable using GFA functions like TypeName, VarPtr, and ArrPtr. I only wanted to know how the returned information differed form a normal variable compared to a pointer variable. As I pointed out in the article, the only difference seems to be the lack of a memory location for the data for a variable. Of course, this is true, but GFA-BASIC also generates different code to access the data of a pointer variable. Since I didn't inspect the disassembly (DisAsm Ocx object) I didn't see it. As it turned out, GFA-BASIC uses the C pointer-style to access a memory location through a pointer; the GFA-BASIC pointer type is compatible with the C-pointer. Since I will try to keep this post not too technical, I will limit the assembly code a minimum. Important to relaize is that a pointer can be kept in a 32-bit register, which doesn't apply for variables wider than 32-bits. Unfortunately, the compiler won't accept the statement:
Local p1 As Register Pointer To Double      ' double *p1;
The syntax control accepts it, but the compiler doesn't put p1 in a register. Schade (unfortunately). On the other hand the speed increase would be minimal, it is the difference of using a register directly: mov dpt [esi], value ; set first long pointed to in esi and of first copying the pointer to eax from the its (stack) position: mov eax, dpt 116[ebx] ; move pointer content mov dpt [eax], value ; set first long to value GFA also optimizes te code by keeping the pointer in eax when the pointer is accessed multiple times in a row (see for yourself). Now we are where I want it to be. The compiler generates assembler code to access the data of a variable declared with Pointer [To] through a pointer, like C. For instance the second long eax points to is accessed using mov dpt 4[eax], value ; set second long And this is completely different from the code used to access a non-pointer variable. As a conclusion I will summarize how to treat a pointer in GFA compared to C:
' Declare and initialize a double
Dim dbl As Double = 3.1         ' double dbl = 3.1;

' Declare an unitialized pointer to double
Dim p As Pointer To Double      ' double *p;

' Initialize the pointer
Pointer(p) = V:dbl                ' p = &dbl;

' Assign a value to dbl through p:
p = 7.2                           ' *p = 7.2
In the next post I discuss the ByRef and Pointer commands.

30 March 2009

Sort a ListView Ocx

To set the scene you might want to look at the knowledge base about sorting a ListView control in (Visual) BASIC http://support.microsoft.com/kb/170884. GFA-BASIC 32 supports only one method to sort a column, where VB uses three (!) properties to get a ListView control sorted. The .Sort method is defined as: ListView.Sort column%, compare% The .Sort method of the control only sorts the items alphabetically, either ascending or descending. The column% argument specifies the column to sort, but in contrast with the ColumnHeaders collection it starts counting at 0, rather than at 1. I have no idea why, but the column% parameter is passed to the LVM_SORT message directly. Ok, well so be it. The compare% argument is a bit more complex, because it is used as a bit flag. When the lowest bit of the high order word is set ($10000), the column is sorted descending (Z-A), otherwise ascending (A-Z). The low-word of the compare% argument specifies the compare mode GFA-BASIC 32 will use when comparing one item to another. The values are the same as for Mode Compare. For example, to sort the second column in a descending way using uppercase comparison (Mode Compare = -2), you would use: ListView.Sort 1, $10000 Or (-2) A bit complicated. Maybe the Sort method should have used the Mode Compare setting rather than letting it be specified in an argument. Anyway, a column may be sorted comparing using lowercase or using uppercase separately from the global setting. Custom Sort A column doesn't always consist of text string entries that have to sort alphabetically. A column may consist of dates or numbers (integers or floats). The .Sort method isn't prepared for sorting anything different than text. The link to the knowledge base opens an article that discusses how to create a custom sort for dates for a VB ListView control. Although, GFA-BASIC 32 tries to be VB compatible, the ListView Ocx control differs in many aspects. (The previous post discusses how the ListItems collection differs from VB.) GFA-BASIC 32 stores a pointer to the ListItem object in the lParam member of the LV_ITEM structure when it adds an item to the ListView Ocx. VB stores the ListView's item index. This difference is important because of the custom compare function to write for the custom sort. To initiate a custom sort you must send the LVM_SORT message to the ListView. Most often you will do this after a click in the column header. In GFA-BASIC 32 you might see the following code, ported from the code presented in the MS knowledge base article:
' GFA32 argument ByRef, not by value!
Sub ListView1_ColumnClick(ColumnHeader As ColumnHeader)
  Dim lngItem As Long
  ' Note:
  ' ColumnHeaders collection is 1-based
  ' ListItem.SubItems(iCol) is 0-based
  ' ListView.Sort iCol is 0-based

  Dim iSubItem As Long = ColumnHeader.Index - 1 ' 1-based

  'Handle User click on column header
  If ColumnHeader.Text = "Name" Then  'User clicked on Name header
    'ListView1.Sorted = True        'Use default sorting to sort the
    'ListView1.SortKey = 0          'items in the list
    ListView1.Sort iSubItem, 0          ' 0-based
  Else
    'ListView1.Sorted = False       'User clicked on the Date header
    'Use our sort routine to sort by date
    SendMessage ListView1.hWnd, LVM_SORTITEMS, iSubItem, ProcAddr(CompareDates)
  End If

  'Refresh the ListView before writing the data
  ListView1.Refresh

  ' MS/VB
  'Loop through the items in the List to print them out in sorted order.
  'NOTE: You are looping through the ListView control because when
  'sorting by date the ListItems collection won't be sorted.

  For lngItem = 1 To ListView1.ListItems.Count
    'ListView_GetListItem lngItem, ListView1.hWnd, strName, dDate
    'Print ListView1.ListItems(lngItem).AllText
  Next

  ' GFA32 - The ListItems collection is sorted as well.
  Dim li As ListItem
  For Each li In ListView1.ListItems
    Print li.AllText
  Next

End Sub
To show the differences with VB the code is heavily commented. The clue is located in the next code line:
SendMessage ListView1.hWnd, LVM_SORTITEMS, iSubItem, ProcAddr(CompareDates)
The .hWnd property returns the window handle of the ListView Ocx. The wParam argument specifies the corrected column number and the lParam argument holds the pointer to the function to compare two ListView items. The CompareDates function is declared as:
Function CompareDates(ByVal lngParam1 As ListItem, _
  ByVal lngParam2 As ListItem, ByVal iCol As Long) As Long
The first two long integer arguments receive the lParam member of the LV_ITEM structure, which is in GFA-BASIC 32 a pointer to a ListItem object. The compare function is a callback function. It is called from inside Windows and never from a GFA-BASIC 32 statement. GFA-BASIC 32 cannot check the function parameters against the data types passed when called. When the program is executing the lParam members are simply copied to the first arguments of the compare function. Because we know the value points to a ListItem object, we cast the 32-bits value to a Listitem. The GFA-BASIC 32 compare function is much simpler, convert the text of the ListItems to apropriate data type and compare:
Function CompareDates(ByVal lngParam1 As ListItem, _
  ByVal lngParam2 As ListItem, ByVal iCol As Long) As Long

  Dim dDate1 As Date, dDate2 As Date

  ' MS/VB way:
  'Obtain the item names and dates corresponding to the
  'input parameters
  °ListView_GetItemData lngParam1, hWnd, strName1, dDate1
  °ListView_GetItemData lngParam2, hWnd, strName2, dDate2

  ' GFA32 way. The LV_ITEM.lParam is passed to the compare
  ' function. The lParam contains the address of the ListItem
  ' object, which we intelligently casted to a ListItem object!
  dDate1 = ValDate(lngParam1.SubItems(iCol))
  dDate2 = ValDate(lngParam2.SubItems(iCol))

  'Compare the dates
  'Return 0 ==> Less Than
  '       1 ==> Equal
  '       2 ==> Greater Than

  If dDate1 < dDate2 Then
    CompareDates = 0
  ElseIf dDate1 = dDate2 Then
    CompareDates = 1
  Else
    CompareDates = 2
  End If

End Function

28 March 2009

The ListItems Collection (1)

Recent posts discuss the underlying implementation of the Collection Ocx object. I pointed out that COM (read Microsoft) dictates the properties and methods and the number and type of parameters used to invoke them. In short, a collection is a COM object with a limit set of prededicated members, where the member type is a Variant. In the same style VB/COM introduced collections to provide as a means to access items of a control. In early Windows only the standard controls ListBox and ComboBox provided a user interface to access an item from a list of strings. For these controls COM/VB doesn't provide a collection to set and get a given item. Both COM controls included the following properties and methods to manage the entries in the control: .AddItem .RemoveItem .List(index) .ListCount Knowing everything about the Collection object now, you will immediately see the resemblence with the Collection objects properties and methods: .Add .Remove .Item(index) .Count When it came to create COM wrappers for the common controls introduced with Windows 95 like ListView, TreeView, ToolBar, StatusBar, and ImageList, the VB creators thought about abstracting the items of a control by introducing Collections. The elements of these controls were now to be accessed like the mother of all collections, the Collection object. Whenever a control hasn't a fixed number of elements it's elements are to be managed by a collection. For a ListView control this meant the introduction of multiple collections to manage the different aspects of the control. For instance, when the ListView is viewed in report or detail view it contains a header control, which required a ColumnHeaders collection. Then there were the items of ListView itself, they were assembled in a ListItems collection. Then there were checked and selected items, they were collected in a CheckedItems and a SelectedItems collection, respectively. Fortunately, the collections ListItems, CheckedItems and SelectedItems are all collections of the same (data) type: the ListItem object. This brings us to the subject of this post. The ListItems collection is a collection of ListItem objects. As any collection it supports - as it should - the well known Collection properties and methods to manage the entries. The ListItems collection is a way to access the ListView items, which are more than simple strings. A ListView item must specify text, an icon, a selected icon, a checked state, a selected state, etc. The windows API describes a ListView item in a LV_ITEM structure (Type). Type LV_ITEM mask As Long iItem As Long iSubItem As Long State As Long stateMask As Long pszText As Long cchTextMax As Long iImage As Long lParam As Long iIndent As Long End Type This API type is connected to the ListItem COM object. Note that ListItems is a collection of objects of this ListItem object. Anything you do using a ListItems collection (Add, Remove, Count, Item) means operating on a ListItem object. For instance, the .Item method returns a ListItem object, and the .Count property returns the current number of ListItem objects. For now and the upcomming post it is important to realize a few things.
  • A ListItem object is directly connected to a LV_ITEM structure. A pointer to the ListItem Ocx object, as it is implemented by GFA-BASIC 32, is stored in the lParam parameter of the LV_ITEM structure.
  • When you add an item to the ListView, a ListItem COM object is created and a pointer is stored in the LV_ITEM structure. Then the LV_ITEM is inserted into the ListView control. The ListItem is NOT added to som internal array of ListItems, because GFA-BASIC doesn't maintain a hash table for the ListItems collections.
  • A ListItems collection object only provides a way to access these items. A ListItems collection is used to count the number of items in a ListView and to obtain and remove a single ListItem object. When you invoke the .Remove method of the ListItems collection the ListView's item is removed and the ListItem COM object stored in the lParam is destroyed.
The ListItems collectection complies to the interface of a Collection Ocx, but it isn't (at all) implemented as a Collection Ocx, as I described in previous posts. There is no underlying hash table where the ListItem objects are stored. The ListItems properties and methods act upon the ListView directly. When ListItems.Count is invoked, the COM object sends the LVM_GETCOUNT message to the ListView control. In the same way, the .Item(index) method of the ListItems collection, sends the LVM_GETITEM and obtains the ListItem object from the lParam member of the LV_ITEM structure. When you use the For Each construct to iterate over the ListView's items, GFA-BASIC sends a LVM_GETITEM for each item currently present in the ListView. As such, the ListItems collection always represent the current state (and thus sort order) of the ListView. This is an important difference with VB, where the ListItems collection is stored separatly (in a hashtable) from the ListView's items. VB must take extra steps to keep the ListItems collection in sync with the actual state of the ListView. A strange design flaw, because after a sort operation VB's ListItems collection is completely out of sync! Next post discusses a custom sort routine for a ListView.

18 March 2009

The Form_Load event

Remember the previous mail about the _KeyUp event resulting from the F5 keypress to run the program? To get rid of the pending WM_KEYUP from F5 you could put a DoEvents statement as the first line in your program when it uses the LoadForm command. But how does this interfere with the form's _Load event? First, give the next code sample a look. DoEvents // discard WM_KEYUP LoadForm frm1 // invokes frm1_Load Do : Sleep : Until Me Is Nothing Sub frm1_Load EndSub The _Load event is not invoked from within the message loop. Instead the sub frm1_Load is called as the last thing in the LoadForm command. The _Load event isn't like other event subs that result from some notification message from Windows. COM, and thus GFA-BASIC 32, defines a whole set of standard events (mouse, key, and focus events) for windowed Ocx controls. Windowless Ocx controls (ImageList and CommDlg) on the other hand cannot receive input messages and don't support these events. Where do the event calls come from? The standard Ocx event subs are invoked from within a subclassed window procedure. When an Ocx is created it is immediately subclassed and prepared to invoke the event subs. Windows controls typically send certain window messages to their parent window. Some of these messages, such as WM_COMMAND and WM_NOTIFY, provide notification of an action by the user. Others, such as WM_CTLCOLOR, are used to obtain information from the parent window. Rather than handling these messages, the parent (or container in COM terminology) intercepts certain window messages and sends them back to the control. The control, in its subclassed window procedure, can then process these reflected messages by taking actions appropriate for an ActiveX control (which is, firing an event) or setting a color. Fired events mostly originate from a real Windows message, but not always. The Form's _Load event is one of them, it is an event invoked from within the LoadForm command. There are no WM_PAINT, WM_ACTIVATE, or WM_NCACTIVATE messages pending to be fired as an event. The message queue is empty. Filling in the form wit data is to be done in the _Load event.

16 March 2009

IDE: not perfect? F5 and the _KeyUp event

We all know the editor isn't perfect, but this problem took me by surprise. I only discovered this problem with the GFA-BASIC 32 IDE last week when I was creating a simple test program to study the way GFA-BASIC 32 handles the COM events for the Ocx controls. It turns out that the WM_KEYUP message from the F5 key press to start the program isn't discarded before the program is executed. A description I discovered this when I put a TextBox on a Form using the form editor and then returned to the text editor to start the form using the LoadForm command. I inserted all TextBox event Subs to study the way they are executed from inside the GfaWin23.OCX dll. I first run the program without the debugger using Trace commands to make sure events were executed. Surprisingly, the first event I received was a _KeyUp event for the TextBox, but I didn't hit any key! I removed the LoadForm command and replaced it by statements to open a Form and a TextBox by hand, so I created a form without the use of a form resource. Now the _KeyUp event didn't get fired. How come? I tried to start the LoadForm variant using a mouse click on the Run button in the toolbar, now the _KeyUp event didn't fire. The conclusion was obvious, the WM_KEYUP message of the F5 accelerator key never was discarded. The solution When you use commands (OpenW, Form, Dialog) to open a window or form, GFABASIC-32 invokes an internal DoEvents to display the window immediately after the command is executed. The first internal DoEvents will also read the pending WM_KEYUP from the message queue and you will never be bothered with it. In case of a LoadForm command all forms and controls are created without ever calling the internal DoEvents function. The first time a message is read from the queue is with your Sleep command inside your message loop. Hence the _KeyUp event. Consequently, the solution is simple. To get rid of the pending WM_KEYUP from F5 put a DoEvents statement as the first line in your program when it uses the LoadForm command. DoEvents LoadForm frm1 Do Sleep Until Me Is Nothing Where does the event come from? For those who are as curious as me, the WM_KEYUP is a result from the way Windows handles the TranslateAccelerator() API function. This function sends, not posts, a WM_COMMAND to the window procedure after an accelerator key is used. But beware, the TranslateAccelerator() API function doesn't wait for a complete set of WM_KEYDOWN/WM_KEYUP to send the WM_COMMAND, no instead it sends the WM_COMMAND after the first WM_KEYDOWN. The IDE then responds by compiling and running the program, all from a sended WM_COMMAND message. The IDE never returns to its main message loop before ending your program. When the GFA-BASIC source is started the WM_KEYUP is still waiting in the system's message queue and is retrieved in the first GetMessage() or PeekMessage() API call, which will be in your main message loop. Hence the _KeyUp event.

IDE: not perfect? Cut and Paste bug

Before I will continue my series on Ocx Collections, I should mention two (un)known issues about the IDE. First let me tell you about the Cut and Paste bug. One issue concerns the crash of the editor after some copy/cut and paste opertions. The editor may crash suddenly and the reason is often mysterious. It maybe a repeated Find operation, but it may also as simple as hitting Enter to insert a new line. I have not been able to reproduce the problem in such a way that I can investigate it, so if you know exactly how to reproduce the bug please let me know.

09 March 2009

The Collection Ocx (part 3)

In part three of this series I will continue with the performance of a Collection Ocx. I will also point out why you might prefer a Hash over a Collection. Not only for its performance, but also for its additional features. A Collection is a Hash Remember the Collection Ocx data type is a simple COM wrapper of the Hash Variant data type. The Collection COM object provides a very thin wrapper, because adding items to a Collection Ocx is as fast as adding them to a Hash Variant type. This is not true for iterating using the For Each construct (or For i = 1 to Count) to access one item a time using a number for the index (not a string). Given a Collection c and iterating using For i = 1 ... takes 50% more time than on a Hash Variant. Dim i%, e As Variant, t# = Timer For i% = 1 To c.Count e = c(i) Next t# = Timer - t# Print "For i = 1 To c.Count on Collection:"; t# Given a Hash Variant a: Dim a As Hash Variant ' add a lot of items ... and then: t# = Timer For i = 1 To a[%] e = a[% i] Next t# = Timer - t# Print "For i = 1 To a[%] on Hash:"; t# The same is true when iterating is performed using For Each. The Hash data type is 50% faster in accessing an item by index. Why? To return an item from the Collection the .Item method is invoked taking a Variant as a parameter. The index variable i is first converted to a Variant and then passed to the Collection.Item method, where it is decoded to see if the argument is a string or a number... Hence the performance difference. COM makes things slow because it uses Variants. Using strings as the index The Collection data type is most useful when used as a dictionary type, where data is connected to a word (name, key, or whatever). The Collection is used to store and locate data by a string, the key. What happens to the performance when a key string is used to access an item? Remember that the .Item method takes a Variant as an Index argument. This means that each index, be it a number or a string is to be converted to a Variant before the .Item method is called. In case you use a string, it is converted to the COM Automation data type BSTR and stored in the Variant parameter. This conversion is performed at the callers side, at the code line that holds the string. Now, if GFA-BASIC 32 would use the normal COM procedures to convert a string to a Variant, it would call the OLE automation Variant APIs. However these APIs (in OLEAUTO.DLL) are slow! So, GFA-BASIC 32 implements its own Variant conversion routines (these routines aren't actually functions, but inlined code) which are many times faster. In general: For each GFA-BASIC 32 Ocx COM object, GFA-BASIC inserts fast Variant-conversion code when that Ocx object uses a Variant property or method argument, resulting in 30%-40% speed increase. Now stay with me. The Collection Ocx object receives the index as a Variant argument of the .Item method, which either contains a number or a string (BSTR). When the COM object receives a string as a Variant, it needs to convert it back to ANSI to perform the underlying call to the Hash table. See the introduction of this post. On the Ocx side in GfaWin23.Ocx, GFA-BASIC could have used the OLE APIs to convert the Variant back to an ANSI string. But again, each Ocx method taking a Variant uses the fast inlined conversion code. The Variant (BSTR) conversion to an ANSI string is as fast as it could be. (I call this a typical Frank Ostrowski optimization, which he didn't implement before GFA-BASIC 32 version 1.05.) GFA-BASIC 32 complies to COM rules, but provides fast optimizations when possible. Then there was another optimization possible with literal string keys on any collection ... Hash is always better But first, for the record. These string conversions don't apply to the Hash data type. The GFA-BASIC 32 Hash commands and operators take simple ANSI strings for the key arguments and don't need any conversion. That is why the Hash Variant is so much faster. By the way, the Collection COM object is specified by MS to provide only five properties/methods (Add, Count, Item, Remove, and the hidden _NewEnum for For Each). MS didn't honour requests to include a sort method, save and load methods, and key-to-index and index-to-key conversions. Well, GFA-BASIC 32 complies to the COM standard, but through the Hash data type it does provide all these options. VB programmers don't have these features! An additional optimization with ! Back to another optimization. In VB(A) a collection item is often accessed using the ! operator. GFA-BASIC supports the ! operator as well (for all collections). It works exactly the same as when a literal string is specified as the argument: v = c("billy") v = c!billy The ! is always used with a literal string, for instance, to access the item named "billy" in a collection c. When the compiler encounters a literal string for the key argument, a special GFA-BASIC 32 function is used to obtain the data from the Collection Ocx without the BSTR conversion. Rather than calling the .Item method of the Collection method directly, the .Item call is redirected to a special GFA-BASIC 32 routine that bypasses the BSTR conversion and directly invokes the 'get-item-by-key' function of the underlying hash table. This speeds up the item access about 30%, because of the lack of BSTR conversion. The speed optimization is only for the GFA-BASIC 32 Collection object. However, the ! syntax also requires less machine code to execute. It therefore also improves execution speed by reducing the generated assembler code by the compiler, the generated code is less by 20 bytes per call. Here all collections benefit from the ! operator, not just the Collection Ocx. Performance comparison taken from the ReadMe_105.html: OLE optimization compared to Hash Variant v = c("a") - 65 cycles/64 bytes (before version 1.05) v = c("a") - 43 cycles/40 bytes v = c!a - 43 cycles/40 bytes v = h["a"] - 30 cycles/26 bytes Conclusion 1. The Collection Ocx is a fast COM object based on a Hash table. 2. GFA-BASIC 32 optimized item access with all GFA-BASIC Ocx collection objects. 3. With literal strings as keys, the ! operator generates less code. 4. The Hash data type is faster and offers more features. Keep tuned, the next time I discuss a special sort of collection; the ListItems collection.

The Collection Ocx (part 2)

In the previous post I introduced the Collection Ocx object and discussed its implementation. The Collection COM object is nothing more than a COM wrapper to the Hash object. To be more precise, the Collection Ocx maintains a Hash Variant data object. The following two statements both result in the creation of a Hash data structure: Dim col As New Collection Dim hs As Hash Variant The Hash commands (and its many operators []) provide the functionality to the underlying hash table. The hash table is the data structure GFA-BASIC 32 uses in almost any case some kind of collection is to be made. GFA-BASIC 32 uses the hash table at the base of the Ocx collections like Panels, ListImages, ColumnHeaders, etc. But it also uses the hash table intensively at the heart of the compiler. All data types, variables, constants, anythings that has to be collected during compiling is stored in a hash table. This prooves the outstanding performance of the Hash data structure, because the compiler is one of the fastest around. Not all Ocx collections use the hash table though; the ListItems Ocx is a COM wrapper around the ListView control's LVM_GETITEM message. The ListItem object, the element from a ListItems collection, is stored in the lParam member of the LVITEM structure used with LVM_GETITEM and LVM_SETITEM messages (see your SDK/MSDN). This brings me to the heart of my post, all so called Ocx/COM collections use the same interface. They use the same properties and methods to access a collection element. These properties and methods I discussed in the previous post. For most of Ocx collection objects these properties and methods directly translate in Hash commands. As said, some Ocx collections implement the methods (and properties) Add, Item, Remove, Clear, and Count in some other way. For instance, the ToolBar's collection Buttons is rather buggy, because the Buttons collection isn't a wrapper for a Hash table, but for a set of functions that maintain the buttons in another way. The Buttons collections is not build on proven mechanism of the hash table. Because most GFA-BASIC 32 Ocx collections are based on the Hash data structure, accessing an item is very fast. An item is either accessed using the default Item() method or obtained with the iteration command For Each...Next. The argument of the Item method of a collection always takes a Variant, which contains
  • a numberic value specifying the index
  • or a string specifying the key used to store the item
Collection elements are accessible by their index value, although the lower bound of a collection object’s index is always 1, and can’t be set otherwise in code. In VB, however, the performance is poor if items are to be accessed by index. This is in contrast with GFA-BASIC 32, which has a very fast implementation of accessing a collection object by index, due to its fast performance Hash structure! (And this isn't the only optimization GFA-BASIC 32 features with Collections.) Often the default method .Item() is invoked using a literal string key, rather than a string variable. For instance, look at these statements that access the element by a literal string key: v =c("a") v = c.Item("a") In the case the string literal starts with an alphabetic character (a-z, A-Z) and for the rest contains digits and/or an underscore (_), GFA-BASIC 32 uses a non-OLE compatible function to access the item. Rather than converting the "a" to COM string, GFA-BASIC 32 passes a pointer to the ANSI version of the key-string. This might need some more explanation. Every call to a COM function, whether it is a property or method, that takes a string (either in a Variant or as string parameter) uses a special COM BSTR data type. To facilitate the COM requirement GFA-BASIC will convert each(!) string you assign to a COM property or use in a method call to a BSTR, a wide UNICODE string. GFA-BASIC 32 consequently uses this ANSI to UNICODE conversion, because this is required by the COM rules. GFA-BASIC 32 must perform these conversions when you use the CreateObject() function to obtain an external COM automation object that require COM compatibility, which is the use of UNICODE. Internally, GFA-BASIC 32 objects stores the BSTRs as ANSI strings. Now you might notice the strange fact, that a string passed to an Ocx object is conerted twice, form ANSI to COM-compatible UNICODE, and back from UNICODE to ANSI. This IS EXACTLY what is happening, for instance when you pass a (large) string to aTextBox ocx, the string is first converted to UNICODE and then passed to the underlying EDITTEXT control using SetWindowTextA(). Before the string is passed to the ANSI version the SetWindowText() API it has to reconverted to ANSI. Fortunately, GFA-BASIC uses its own highly optimized Ansi-to-Unicode and Unicode-to-Ansi functions. The !-operator is used to provide an optimization for the literal strings used as keys used in methods to access an item of a GFA-BASIC 32 collection (Collection, ListImages, ListItems, Buttons, etc), the literal strings are passed in ANSI format rather than in UNICODE when you use the ! operator to specify the key. This alternative syntax results in 'v!a' for the example mentioned, and in general: Collection!StringIndex. So v =c!a <=> v =c("a") Next time I discuss the gain in performance.

06 March 2009

The Collection Ocx (part 1)

My first serious entry in this blog concerns the intrinsic Ocx object Collection. It is supposed to be the mother of all collections used throughout the Ocx implementations in GFA-BASIC 32. These collections include Panels, Buttons, ListItems, ListImages, etc. I will come to speak about these collection in the near future many times. However, as we will see, the Collection object itself isn't primary base of these other collections, it is the Hash data type that lies at the basis of each collection. Like VB(A), GFA-BASIC 32 features the generic Collection COM object type, which is simply a container for data. Although typically it is used with other objects (Panels, Buttons, etc.) it can in fact hold data of any type and stores the data in a Variant. Everything that can be stored in a Variant can be stored in a collection, including other collection objects (Variants can hold COM objects). The Collection object is a kind-of object-oriented version (Ocx) of the array. The Collection object is internally implemented as a hashed 'linked list', and as such provides good performance for access either by an item's key, index (yes GFA-BASIC 32 does), or through enumeration. In particular, I would emphasize the feature to access the hash elements by index. This is an extension on a 'normal' implementation of a hashed 'linked list'. In fact the GFA-BASIC 32 Hash data type is not implemented as a linked list, although the name of the data type suggests it does. The GFA-BASIC 32 Hash data type has been optimized extensively because it is used as storage for almost everything that GFA-BASIC does. Later more on this. Sufficient is to say that the compiler, which is very fast, uses many hash tables to store variable and type information. Let's look into the Collection type for now. You create a Collection as you would any other object: using Dim and the New declaration. Dim MyColl As New Collection The New keyword creates a new object of the specified type. It actually allocates memory for the COM object and sets its reference count to 1. The variable points to the allocated memory, which it wouldn't if the New clause was omitted. Remember, control Ocx objects are created using the Ocx or OcxOcx keywords. These keywords implicitly execute the Global Dim ... New command. (Each variable introduced with Ocx and OcxOcx is global and the control is actually created.) You cannot create a control Ocx object using a Dim statement, the compiler is instructed hard coded to explicitly check for New in combination with only a few COM types (Collection, DisAsm, StdFont, StdPicture). New in combination with any other type is rejected by the compiler. A COM object always comes with a set of functions (procedures) that are collected into an array, this is called an interface. The first element of a COM object always points to this array with functions, this element is always called lpVtbl. The first 7 entries in the interface contain the mandatory IUnknow functions (3) followed by the (optional) IDispatch functions (4). These standard OLE functions are then followed by the functions that make up the functionality of the Collection COM object: Item, Add, Count, Remove, and the hidden _NewEnum functions, respectively. Some of the interface functions are labeled 'get' or 'put', others aren't. When a function is labeled get or put the compiler is instructed to accept those functions as if they are used as variables using the assignment operator =. For instance, the Count function is a 'property get' function, so the syntax in which the Count property is used is like: i% = MyColl.Count As it turns out all other functions of the Collection object are 'methods' (the type library doesn't label them as either get or put) and are to be used as procedure calls. Here is a short overview (all variables are Variants): v = MyColl.Item(vIndex) MyColl.Add( vItem [, vKey, vBefore, vAfter] ) MyColl.Remove(vIndex) Add adds an item to the collection. Along with the data itself, you can specify a key value by which the member can be retrieved from the collection. The key is a Variant and converted to a string when the item is stored. Count returns the number of items in the collection. Item rretrieves a member from the collection either by its index (ordinal position in the collection) or by its key (assuming that one was provided when the item was added to the collection). The index and key argument are always converted to a Variant before they are passed to the Item method. Remove deletes a member from the collection either by its index or its key. For example, the following code fragment defines a collection object, colLanguages, to hold programming language information, and adds two members to it that can later be accessed by their key, which in this case happens to be their abbreviation: Dim colLangauges As New Collection colLanguages.Add "GFA-BASIC32", "GB32" colLanguages.Add "Visual Basic", "VB" Using Collections Given the similarity to arrays, why use collection objects, rather than arrays, in your code? Using collections, you can group related items together. Collections can be heterogeneous, that is, members of a collection don't have to share the same data type (Variant), and that can be very useful, because life doesn't always present you with collections made up of items of the same type. So the major reason is ease of access and ease of maintenance:
  • Members can be added before or after an existing member based on the latter’s key value as well as its index value.
  • Members can be retrieved based on either their key value or their index value.
  • Members can be deleted based on either their key value or their index value.
Collections are very useful and are one of the high points of (Visual) BASIC. However, because of the heterogeneous nature of their contents, they don't necessarily lend themselves to tight and uniform coding practices, which makes some C and C++ programmers look down their noses at BASIC. Since this is the first entry in the blog I explicitly invite you to comment!

05 March 2009

Brief overview of GFA-BASIC 32

GFA-BASIC has a long history. The first version was released in 1985 for the Atari ST, followed by a version for the Amiga. Then GFA-BASIC was developed for for the Intel processors: GFA-BASIC for MSDOS, GFA-BASIC for Windows 3.1, and the most recent GFA-BASIC 32 for Windows 95 and later.

Introduction
GFA-BASIC 32 is no longer divided in an interpreter and a stand-alone compiler. When run from the IDE the code is first compiled to machine code and then executed. The compiler is optimized for producing fast machine code, so that GFA-BASIC 32 programs execute at high speed. The command library of GFA-BASIC 32 is partly compatible to Visual Basic and 16-bit GFA-BASIC. Much of the functionality of the 16-bit version is retained, but due to an entirely new concept of creating and handling of windows and dialog boxes, GFA-BASIC 32 is also quite different and much more compatible to VB in that area. Other incompatibilities are due to the 32 bits operating system; an integer is now 32-bits wide rather than the 16-bits in GFA-BASIC 16-bit, for instance. When porting application from 16-bits GFA-BASIC, GFA-BASIC 32 will automatically convert 16-bit code to the new 32-bit syntax.

Single files
GFA-BASIC 32 files are single project files. Code, forms (windows and dialog boxes), data, resource info are all contained in one file; the .g32 file. To create modular programs part of the code can be compiled in a library file (.lg32) and included into the project file.

New in the text-editor
An odd number of parenthesis in a code line are auto-completed to match all required parenthesis.The underscore character ( _ ) can be used to split "logical" lines of source code across physical lines in the source code file. The underscore character must be preceded by at least one white space character. The effect of using a line continuation character is for "visual" appearance only - the compiler itself treats lines split this way as only one contiguous line of code. The colon character (:) can be used to separate multiple statements on a single (logical) line of source code.
Subs and functions can be folded of course. Procedures can have optional parameters and the code is automatically indented and formatted. 
Keyboard macros for repetitive action keystrokes can be recorded and played back.

Extending the IDE
The Integrated Development Environment can be extended through the use of add-ins, called editor extensions. More than 100 special Gfa_Xxx functions and statements provide access to manipulating the environment. 

Subroutines
Besides the well-known Sub, Procedure, and Function subroutines, you can now also use GoSub and Return anywhere in a procedure, but GoSub and the corresponding Return statement must be in the same procedure. A subroutine can contain more than one Return statement, but the first Return statement encountered causes the flow of execution to branch back to the statement immediately following the most recently executed GoSub statement. Using a GOSUB to perform a repetitive task is almost always faster then performing a call to a Sub or Function, since there is no overhead in setting up a stack frame for a GoSub.

New data types
New data types are Large (64-bit), Date, Currency, Variant, Object, Pointer and Handle. Integer and Long are now 32-bits. Variables declared without specifying a type explicitly are a Variant data type by default.

Const and Enum
A constant is a variable whose value is fixed at compile-time, and cannot change during program execution (hence, it remains constant). A constant is defined using the Const keyword. The Enum keyword is used to define a sequence of constants whose values are incremented by one.

Array, Hash and Collection
Arrays elements can be inserted and deleted. The array can be sorted using quick sort or shell sort. The Hash is a one dimensional array whose index is of type string. The array can be of any type, Int/String/Date/etc. A Hash array is dynamic and is not dimensioned prior to its use. Values are added or assigned to existing elements. A hash can be examined, sorted, saved and loaded. Elements can be accessed by numeric index as well. Access to hash elements are very fast. The Hash is used with Split, Join, Eval and the regular expression functions reMatch, reSub.
The Collection is a one dimensional variant array whose index is of type variant. A collection is dynamic and is not dimensioned prior to its use. Values are added or assigned to existing elements. The collection is mainly targeted at OLE objects.

Many new operators and functions
GFA-BASIC 32 now also supports Iif() and an implicit conditional form (?:). It uses a question mark after the condition to be tested, and specifies two alternatives, one to be used if the condition is met and one if it is not. The alternatives are separated by a colon.

Also new is the string concatenation operator & as used in VB by default. This may lead to strange results, because the numeric operands are automatically converted to strings: "Number" & 7   =>   "Number 7" and 12 & 6 => " 12 6".

For string functions the $-postfix is no longer mandatory, as in Chr$(0) which becomes Chr(0). The return value from a $-free string functions is NOT a Variant as in VB, but a real (ANSI) string. New are the Pascal compatible character constants #number that can be used in place of Chr(number). The following "Line1" #13#10 "Line2" #13#10 is the same as "Line1" + Chr(13) + Chr(10) + "Line2" + Chr(13, 10).

In 16 Bit GFA-BASIC the & - operator is the same as And (as in 'C'), meaning 12 & 6 = 4 <=> 12 And 6 = 4 (%1100 & %0110 = %0100). In GFA-BASIC 32 a new And operator is added, the %&. So, in GFA-BASIC 32 is 12 %& 6 = 4 (%1100 And %0110 = %0100).
Another quirk (coming from VB) is the additional space before the numbers when converted to a string this way. You can prevent the automatic conversion of numeric values to strings when using the & operator in the properties dialog box.
Alternatively, GFA-BASIC 32 offers the operator $ as a replacement for the & concatenation operator.

Comparison and assignment operators
In contrast with 16 Bit GFA-BASIC the expression x == y is now the same as x = y and x := y. The comparison == operator from16 Bit GFA-BASIC should now be replaced by NEAR. Alternatively, you can use a forced floating point comparison like If a = 1!.

Direct memory access
For direct memory access a whole range of variants of Peek and Poke are available (PeekCur, PokeStr, etc, etc.).
Memory move and manipulation commands are provided (MemMove, MemOr, MemAnd, MemFill, etc).
’Bits and bytes’ swap and make functions (BSwap8, MakeLiHo, etc, etc).
Bits rotate and shift is supported.
Port access is supported (Port Out, Port In).

Arithmetic
Besides to the normal arithmetic functions, GFA-BASIC 32 offers Matrix functions and many more  (advanced) mathematical functions.
For runtime expression evaluation GFA-BASIC 32 includes Eval().

Special file functions
Special file functions are for checksums (Crc32, Crc16, CheckSum, etc), file encryptions (Crypt), file compression (Pack/UnPack). Others are MimeEncode/MimeDecode, MemToMime/MimeToMem, and UUToMem/MemToUU.

Win32 API functions are built-in
GFA-BASIC 32 supports more than 1000 WinAPI-Functions, functions that can be used as any other GFA-BASIC 32 function. Only the Standard-API-Functions from User, Kernel und GDI are implemented, other not often used API-Functions like for instance WinSock-Functions are to be declared explicitly as in VB. 
The type of the parameters of the built-in API-Functions are not checked upon compiling. Each parameter is assumed to be a 32-bit integer. A string can be passed to an API function, but is always copied to one of the 32 internal 1030-Byte buffer BEFORE the address of the buffer is passed. A user defined Type (As type) is always passed by reference, so that its address is passed (automatically V: ).  To be on the safe side, keep things in your own hand and pass the addresses explicitly using Varptr or V:.
These rules don’t apply to DLL functions introduced with the Declare statement. Here GFA-BASIC 32 behaves like VB and the rules for calling such APIs must be respected. For a thorough article on using see Visual Basic Programmers Guide Chapter 10 - The Windows API and Other Dynamic-Link Libraries.
Some WinAPI function names are already in use by GFA-BASIC 32 as statement or built-in function names and are therefore renamed. GetObject() becomes GetGdiObject(), LoadCursor becomes LoadResCursor. Obsolete functions are not implemented, obviously.

Built-in Win32 API constants
As with the built-in API functions from User, Kernel and GDI, their accompanying constants are built-in. (1400 API-Constants from the 16 Bit-Version and more then  900 Constants from Win32-APIs are implemented. Obsolete constants are not implemented, obviously.

Graphics
Graphic commands take floating point values now. After scaling (set with ScaleMode) the coordinates are passed as integers to the GDI system. Scaling provides much more flexibility and is VB compliant.
Most graphic commands can be used in VB format: Line (x, y)-(z, t),, BF or PBox x, y, z, t.
The Color-command is now the same as RgbColor in 16 Bit GFA-BASIC. Additionally, a table with the 16 standard colors can be used: Color QBColor(i) or a shortcut QBColor i.
The windows now have an AutoRedraw property so that output is captured (some performance decrease) to a second bitmap as well. A redraw of the window is then performed by copying the contents of the bitmap to the screen.

COM programming
CreateObject creates and returns a reference to an ActiveX object. After the object reference is assigned to a variable of type Object you can use the object's properties, methods, and events. Other COM functions like _DispId and GUID are included. GFA-BASIC 32 includes many COM objects, both for controls and windows as well as for other features.

App and Screen Object
The App OLE object specifies information about the application's title, version information, the path and name of its executable file and Help files, and whether or not a previous instance of the application is running. In addition it provides methods to create shortcuts. It has many properties returning information that are otherwise hard to find. The Screen Object returns the current capabilities of the display screen your code is executing on. The Screen object has much more properties than the VB counterpart.

Err Object
Contains information about runtime errors or helps in generating useful errors. Although the VB compliant On Error Goto structure is supported, the Try/Catch/EndCatch error-handling is preferred.

CommDlg, Printer Ocx object
An OCX wrapper about the common dialog box functions. The Printer Object provides full(!) printer support for your application.

Picture and Font OLE Objects
OLE Object types to create and manipulate fonts and pictures. Since these types are OLE-type compatible, a Font or Picture instance can be assigned to a property of an OCX control or form.

Windows and dialogs
Windows and dialogs are all Forms now and their events are handled the same way as in VB. All standard and common controls are implemented using an OCX object wrapper. In general, all GUI objects are now OCX objects and are manipulated through properties, methods and events. The old GetEvent/Menu() structure is now obsolete.
You can still use third party controls by using the old CONTROL statement. The notification messages are then handled in the window procedure of the parent; an form event sub as well!

Assembler and DisAssembler
GFA-BASIC 32 provides an inline assembler and a disassembler object (DisAsm).

Interested? Use the download link in the Links section of this site. 

04 March 2009

GFA-BASIC 32 Blog start.