Thursday, 23 September 2010

Hashing algorithms: Adler32 implemented in VBA

I could've sworn I posted this here, years ago, but here's an implementation of Adler-32 in 32-bit VBA.

There's a horrible hack in it: Adler-32 returns a 32-bit integer, and the VBA Long is a signed integer with a range ± (2^31) -1, so I've implemented a 'wrap around' of the overflow at +2^31, restarting at -2^31 +1. And done something I really, really shouldn't have done with a floating-point variable. Eventually everyone, everywhere, will have 64-bit Office and this'll be kind of quaint and unnecessary... Right?

Of course, the real question is: why bother?

It boils down to the common question of checking for changes: if you don't want to use the 'on change' event, or you're dealing with data directly in VBA before it hits the sheet, large data sets need something better than an item-by-item brute force approach. At least, if you're doing it more than once: the cost of rolling each item into your hash is always more than the cost of the one-by-one comparison...

...And that's still true if you're importing a fast hashing algorithm from MySQL or one of the web API libraries (try MDA5, if you can get at an exposed function), unless you can find something that reads VBA variant arrays directly and relieve your VBA thread of the task of enumerating the list values into the imported function.

Meanwhile, here's a hash algorithm that's within reach of VBA: Adler32. The details are in Wikipedia’s article on Adler32: http://en.wikipedia.org/wiki/Adler-32 and an hour's testing will teach you some lessons about hashing:
  1. 'Hash collisions' (differing data sets returning the same hash code) are more common than you expected, especially with data containing repeated patterns (like dates);>
  2. Choice of hashing algorithm is important;
  3. ...And that choice is more of an art than a science;
  4. Admitting that you really shouldn't have bothered and resorting to brute force is often the better part of valour.


Adler-32 is actually more useful as a tool to teach those lessons, than as a workaday checksum. It's great for detecting changes in lists of more than 100 distinct items; it's tolerable, on a list of 24 randomly-generated 8-letter words (hash collisions at 1 in 1800 attempts) and it starts giving you single-digit percentage occurrences of the hash collision error in a list of 50 not-so-distinct option maturities, where the differences are mostly in the last 10 chars and *those* ten chars are recurring 3-month maturity dates.

By the time you're comparing pairs of 6-letter strings, more than 10% of your changes will be missed by the checksum in a non-random data set. And then you realise that might as well be using string comparison for that kind of trivial computation anyway.

So the answer is always: test it.

Meanwhile, here's the algorithm, horrible hacks and all:


Option Explicit


Public Function CheckSum(ByRef ColArray As Variant) As Long
Application.Volatile False

' Returns an Adler32 checksum of all the numeric and text values in a column

' Capture data from cells as myRange.Value2 and use a 32-bit checksum to see
' if any value in the range subsequently changes. You can run this on multi-
' column ranges, but it's MUCH faster to run this separately for each column
'
' Note that the VBA Long Integer data type is *not* a 32-bit integer, it's a
' signed integer with a range of  ± (2^31) -1. So our return value is signed
' and return values exceeding +2^31 -1 'wraparound' and restart at -2^31 +1.



' Coding Notes:

' This is intended for use in VBA, and not for use on the worksheet. Use the
' setting  'Option Private Module' to hide CheckSum from the function wizard


' Author: Nigel Heffernan, May 2006  http://excellerando.blogspot.com
' Acknowledgements and thanks to Paul Crowley, who recommended Adler-32

' Please note that this code is in the public domain. Mark it clearly, with
' the author's name, and segregate it from any proprietary code if you need
' to assert ownership & commercial confidentiality on your proprietary code

Const LONG_LIMIT As Long = (2 ^ 31) - 1
Const MOD_ADLER As Long = 65521

Dim a As Long
Dim b As Long

Dim i As Long
Dim j As Long
Dim k As Long

Dim arrByte() As Byte

Dim dblOverflow As Double


If TypeName(ColArray) = "Range" Then
    ColArray = ColArray.Value2
End If

If IsEmpty(ColArray) Then
    CheckSum = 0
    Exit Function
End If

If (VarType(ColArray) And vbArray) = 0 Then
    ' single-cell range, or a scalar data type
    ReDim arrData(0 To 0, 0 To 0)
    arrData(0, 0) = CStr(ColArray)
Else
    arrData = ColArray
End If


a = 1
b = 0


For j = LBound(arrData, 2) To UBound(arrData, 2)
    For i = LBound(arrData, 1) To UBound(arrData, 1)
        
        ' VBA Strings are byte arrays: arrByte(n) is faster than Mid$(s, n)
        
        arrByte = CStr(arrData(i, j))  ' Is this type conversion efficient?
        
        For k = LBound(arrByte) To UBound(arrByte)
            a = (a + arrByte(k)) Mod MOD_ADLER
            b = (b + a) Mod MOD_ADLER
        Next k
        
        ' Terminating each item with a 'vTab' char constructs a better hash
        ' than vbNullString which, being equal to zero, adds no information
        ' to the hash and therefore permits the clash ABCD+EFGH = ABC+DEFGH
        ' However, we wish to avoid inefficient string concatenation, so we
        ' roll the terminating character's bytecode directly into the hash:
        
        a = (a + 11) Mod MOD_ADLER                ' vbVerticalTab = Chr(11)
        b = (b + a) Mod MOD_ADLER
        
    Next i
    
    ' Roll the column into the hash with a terminating horizontal tab char:
    
    a = (a + 9) Mod MOD_ADLER                     ' Horizontal Tab = Chr(9)
    b = (b + a) Mod MOD_ADLER

    
Next j

' Using a float in an integer calculation? We can get away with it, because
' the float error for a VBA double is < ±0.5 with numbers smaller than 2^32

dblOverflow = (1# * b * MOD_ADLER) + a

If dblOverflow > LONG_LIMIT Then  ' wraparound 2^31 to 1-(2^31)
   
    Do Until dblOverflow < LONG_LIMIT
        dblOverflow = dblOverflow - LONG_LIMIT
    Loop
    CheckSum = 1 + dblOverflow - LONG_LIMIT
    
Else
    CheckSum = b * MOD_ADLER + a
End If

End Function

Monday, 1 March 2010

VLookup() with fuzzy-matching to get a 'closest match' result


Ever had to look up a name or an address in a list that doesn't quite match, so the standard Excel 'VLookup()' and 'Match()' functions don't help you?
Here's the solution:


The answer is 'Fuzzy Matching', a process of applying rules and picking out a 'best fit'. In the case of text - words, addresses, alphanumeric codes like ISIN identifiers for shares, bonds and other financial instruments - the best approach is the Levenshtein Edit Distance algorithm, a measure of how many characters were changed or moved to get from one text to another; but I've gone for a simplified approach that adds up how much of text A is made up of recognisable fragments of text B.
It works surprisingly well, but there are some limitations. There's a more detailed discussion of this in the preceding post: feel free to offer some suggestions and comments there: what you're reading here is the application of these abstract principles to spreadsheets. Click on the image: it'll show you how the function is used.
What's missing is a version of Match() (you can code that up for yourselves), and a dedicated 'FuzzyVLookupAddress()' function, tweaked to deal with the peculiar things that we do to postal addresses to make them extraordinarily difficult for computers (and spreadsheets!) to read. Instead, I've provided a crude 'NormaliseAddress' function, which standardises all those troublesome Aves, Avenues, Streets, Saints and St.'s. Apply it to the addresses in your list (and to the search term you're trying to look up) and let me know how you get on.
Enough witter: the code's below. It's been tested for cut-and-paste out of Blogger, but there's always a health warning: Blogger will munge the line breaks, and so will whatever you're using to view this post, especially if it's an RSS feed.




Option Explicit


Public Function FuzzyVLookup(Lookup_Value As String, _
                             Table_Array As Variant, _
                             Optional Col_Index_Num As Integer = 1, _
                             Optional Compare As VbCompareMethod = vbTextCompare _
                            ) As Variant

' Find the best match for a given string in column 1 of an array of data obtained from an Excel range
' This is functionally similar to VLookup, but it returns the best match, not the first exact match
' This function is not case-sensitive, unless you specify 'Compare' as 0 or vbBinaryCompare

' If your data quality is poor, you are advised to display the retrieved index value from column 1
' and use the FuzzyMatchScore() function on this index value to reveal the fuzzy-matching 'score' and
' discard all results below a threshold value. Feel free to code up a 'threshold' parameter!

' If you are looking up names and addresses, use the NormaliseAddress() function on your search term and
' searched population to standardise abbreviations and word-order conventions used in British addresses.

' THIS CODE IS IN THE PUBLIC DOMAIN


Application.Volatile False

Dim dblBestMatch As Double

Dim iRowBest    As Integer
Dim dblMatch    As Double
Dim iRow        As Integer
Dim strTest     As String
Dim strInput    As String

Dim iStartCol   As Integer
Dim iEndCol     As Integer
Dim iOffset     As Integer

If InStr(TypeName(Table_Array), "(") + InStr(1, TypeName(Table_Array), "Range", vbTextCompare) < 1 Then
     'Table_Array is not an array
     FuzzyVLookup = "#VALUE"
    Exit Function
End If

If InStr(1, TypeName(Table_Array), "Range", vbTextCompare) > 0 Then
    Table_Array = Table_Array.Value
End If

' If you get a subscript-out-of-bounds error here, you're using a vector instead
' of the 2-dimensional array that is the default 'Value' property of an Excel range.


iStartCol = LBound(Table_Array, 2)
iEndCol = UBound(Table_Array, 2)
iOffset = 1 - iStartCol


Col_Index_Num = Col_Index_Num - iOffset

If Col_Index_Num > iEndCol Or Col_Index_Num < iStartCol Then
     'Out-of-bounds
     FuzzyVLookup = "#VALUE"
    Exit Function
End If



    strInput = UCase(Lookup_Value)

    iRowBest = -1
    dblBestMatch = 0

    For iRow = LBound(Table_Array, 1) To UBound(Table_Array, 1)

        strTest = ""
        strTest = Table_Array(iRow, iStartCol)

        dblMatch = 0
        dblMatch = FuzzyMatchScore(strInput, strTest, Compare)

        If dblMatch = 1 Then ' Bail out on finding an exact match
            iRowBest = iRow
            Exit For
        End If

        If dblMatch > dblBestMatch Then
            dblBestMatch = dblMatch
            iRowBest = iRow
        End If

    Next iRow


    If iRowBest = -1 Then
        FuzzyVLookup = "#NO MATCH"
        Exit Function
    End If

    FuzzyVLookup = Table_Array(iRowBest, Col_Index_Num)

End Function
   
Public Function FuzzyHLookup(Lookup_Value As String, _
                             Table_Array As Variant, _
                             Optional Row_Index_Num As Integer = 1, _
                             Optional Compare As VbCompareMethod = vbTextCompare)

' Find the best match for a given string in Row 1 of an array of data obtained from an Excel range
' This is functionally similar to HLookup, but it returns the best match, not the first exact match
' This function is not case-sensitive, unless you specify 'Compare' as vbTextBinary.

' If your data quality is poor, you are advised to display the retrieved index value from row 1
' and use the FuzzyMatchScore() function on this index value to reveal the fuzzy-matching 'score' and
' discard all results below a threshold value. Feel free to code up a 'threshold' parameter!

' If you are looking up names and addresses, use the NormaliseAddress() function on your search term and
' searched population to standardise abbreviations and word-order conventions used in British addresses.

' THIS CODE IS IN THE PUBLIC DOMAIN


Application.Volatile False

Dim dblBestMatch As Double

Dim iColBest    As Integer
Dim dblMatch    As Double
Dim iCol        As Integer
Dim strTest     As String
Dim strInput    As String

Dim iStartRow   As Integer
Dim iEndRow     As Integer
Dim iOffset     As Integer

If InStr(TypeName(Table_Array), "(") + InStr(1, TypeName(Table_Array), "Range", vbTextCompare) < 1 Then
     'Table_Array is not an array
     FuzzyHLookup = "#VALUE"
    Exit Function
End If

If InStr(1, TypeName(Table_Array), "Range", vbTextCompare) > 0 Then
    Table_Array = Table_Array.Value
End If

' If you get a subscript-out-of-bounds error here, you're using a vector instead
' of the 2-dimensional array that is the default 'Value' property of an Excel range.


iStartRow = LBound(Table_Array, 1)
iEndRow = UBound(Table_Array, 1)
iOffset = 1 - iStartRow


Row_Index_Num = Row_Index_Num - iOffset

If Row_Index_Num > iEndRow Or Row_Index_Num < iStartRow Then
     'Out-of-bounds
     FuzzyHLookup = "#VALUE"
    Exit Function
End If


    strInput = UCase(Lookup_Value)

    iColBest = -1
    dblBestMatch = 0

    For iCol = LBound(Table_Array, 2) To UBound(Table_Array, 2)

        strTest = ""
        strTest = Table_Array(iStartRow, iCol)

        dblMatch = 0
        dblMatch = FuzzyMatchScore(strInput, strTest, Compare)

        If dblMatch = 1 Then ' Bail out on finding an exact match
            iColBest = iCol
            Exit For
        End If

        If dblMatch > dblBestMatch Then
            dblBestMatch = dblMatch
            iColBest = iCol
        End If

    Next iCol


    If iColBest = -1 Then
        FuzzyHLookup = "#NO MATCH"
        Exit Function
    End If

    FuzzyHLookup = Table_Array(Row_Index_Num, iColBest)

End Function


Public Function FuzzyMatchScore(ByVal str1 As String, _
                                ByVal str2 As String, _
                                Optional Compare As VbCompareMethod = vbTextCompare _
                                ) As Double

' Returns an estimate of how closely word 1 matches word 2: this is best displayed as a percentage
' This is calculated as the fraction of the longer string that is made up of recognisable fragments of the shorter string
' There is no support for wildcards and regular expressions. Case-sensitivity is determined by the 'compare' parameter

' THIS CODE IS IN THE PUBLIC DOMAIN


Application.Volatile False

Dim maxLen As Integer
Dim minLen As Integer

    If str1 = str2 Then
        FuzzyMatchScore = 1#
        Exit Function
    End If

    If Len(str1) > Len(str2) Then
        maxLen = Len(str1)
        minLen = Len(str2)
    Else
        maxLen = Len(str2)
        minLen = Len(str1)
    End If

    If Len(str1) = 0 Or Len(str2) = 0 Then
        FuzzyMatchScore = 0#
    Else

        FuzzyMatchScore = 0#
        FuzzyMatchScore = SumOfCommonStrings(str1, str2, Compare) / maxLen

    End If
   
End Function

Public Function SumOfCommonStrings( _
                            ByVal s1 As String, _
                            ByVal s2 As String, _
                            Optional Compare As VBA.VbCompareMethod = vbTextCompare, _
                            Optional iScore As Integer = 0 _
                                ) As Integer

Application.Volatile False

' N.Heffernan 06 June 2006 (somewhere over Newfoundland)
' THIS CODE IS IN THE PUBLIC DOMAIN


' Function to measure how much of String 1 is made up of substrings found in String 2

' This function uses a modified Longest Common String algorithm.
' Simple LCS algorithms are unduly sensitive to single-letter
' deletions/changes near the midpoint of the test words, eg:
' Wednesday is obviously closer to WedXesday on an edit-distance
' basis than it is to WednesXXX. So it would be better to score
' the 'Wed' as well as the 'esday' and add up the total matched

' Watch out for strings of differing lengths:
'
'    SumOfCommonStrings("Wednesday", "WednesXXXday")
'
' This scores the same as:
'
'     SumOfCommonStrings("Wednesday", "Wednesday")
'
' So make sure the calling function uses the length of the longest
' string when calculating the degree of similarity from this score.


' This is coded for clarity, not for performance.


Dim arr() As Integer    ' Scoring matrix
Dim n As Integer        ' length of s1
Dim m As Integer        ' length of s2
Dim i As Integer        ' start position in s1
Dim j As Integer        ' start position in s2
Dim subs1 As String     ' a substring of s1
Dim len1 As Integer     ' length of subs1

Dim sBefore1            ' documented in the code
Dim sBefore2
Dim sAfter1
Dim sAfter2

Dim s3 As String


SumOfCommonStrings = iScore

n = Len(s1)
m = Len(s2)

If s1 = s2 Then
    SumOfCommonStrings = n
    Exit Function
End If

If n = 0 Or m = 0 Then
    Exit Function
End If

's1 should always be the shorter of the two strings:
If n > m Then
    s3 = s2
    s2 = s1
    s1 = s3
    n = Len(s1)
    m = Len(s2)
End If

n = Len(s1)
m = Len(s2)

' Special case: s1 is n exact substring of s2
If InStr(1, s2, s1, Compare) Then
    SumOfCommonStrings = n
    Exit Function
End If

For len1 = n To 1 Step -1

    For i = 1 To n - len1 + 1

        subs1 = Mid(s1, i, len1)
        j = 0
        j = InStr(1, s2, subs1, Compare)
       
        If j > 0 Then
       
            ' We've found a matching substring...
            iScore = iScore + len1           

          ' Now clip out this substring from s1 and s2...
          ' And search the fragments before and after this excision:

       
            If i > 1 And j > 1 Then
                sBefore1 = left(s1, i - 1)
                sBefore2 = left(s2, j - 1)
                iScore = SumOfCommonStrings(sBefore1, _
                                            sBefore2, _
                                            Compare, _
                                            iScore)
            End If
   
   
            If i + len1 < n And j + len1 < m Then
                sAfter1 = right(s1, n + 1 - i - len1)
                sAfter2 = right(s2, m + 1 - j - len1)
                iScore = SumOfCommonStrings(sAfter1, _
                                            sAfter2, _
                                            Compare, _
                                            iScore)
            End If
   
   
            SumOfCommonStrings = iScore
            Exit Function

        End If

    Next


Next


End Function


Private Function Minimum(ByVal a As Integer, _
                         ByVal b As Integer, _
                         ByVal c As Integer) As Integer
Dim min As Integer

  min = a

  If b < min Then
        min = b
  End If

  If c < min Then
        min = c
  End If

  Minimum = min

End Function



Public Function NormaliseAddress(ByVal strAddress As String) As String
Application.Volatile False
' This function is intended to remove or standardise common phrases
' and abbreviations used in British postal addresses, allowing the use
' of string-comparison algorithms in lists of names and addresses.

' Developers in other countries should review the word list used here,
' as conventions probably differ in your local language or dialect.


strAddress = " " & UCase(strAddress) & " "

strAddress = Substitute(strAddress, ",", " ")
strAddress = Substitute(strAddress, ".", " ")
strAddress = Substitute(strAddress, "-", " ")
strAddress = Substitute(strAddress, vbCrLf, " ")
strAddress = Substitute(strAddress, " BLVD ", " BOULEVARD ")
strAddress = Substitute(strAddress, " BVD ", " BOULEVARD ")
strAddress = Substitute(strAddress, " AV ", " AVENUE ")
strAddress = Substitute(strAddress, " AVE ", " AVENUE ")
strAddress = Substitute(strAddress, " RD ", " ROAD ")
strAddress = Substitute(strAddress, " WY ", " WAY ")
strAddress = Substitute(strAddress, " EST ", " ESTATE ")
strAddress = Substitute(strAddress, " PL ", " PLACE ")
strAddress = Substitute(strAddress, " PK ", " PARK ")
strAddress = Substitute(strAddress, " HSE ", " HOUSE ")
strAddress = Substitute(strAddress, " H0 ", " HOUSE ")
strAddress = Substitute(strAddress, " GDNS ", " GARDENS ")

strAddress = Substitute(strAddress, "&", "AND")
strAddress = Substitute(strAddress, " LIMITED ", " LTD ")
strAddress = Substitute(strAddress, " COMPANY ", " CO ")
strAddress = Substitute(strAddress, " CORPORATION ", " CORP ")
strAddress = Substitute(strAddress, " T/A ", " TA ")
strAddress = Substitute(strAddress, " TRADING AS ", " TA ")

' Common personal titles: these are often applied inconsistently or
' omitted, and must therefore be removed. Specific applications may
' require additional titles and their abbreviations - military rank,
' academic titles and degrees, courtesy titles of the aristocracy,
' knighthoods and honours (particularly for lists of civil servants)


strAddress = Substitute(strAddress, " ESQ ", " ")
strAddress = Substitute(strAddress, " MR ", " ")
strAddress = Substitute(strAddress, " MRS ", " ")
strAddress = Substitute(strAddress, " MISS ", " ")
strAddress = Substitute(strAddress, " MS ", " ")
strAddress = Substitute(strAddress, " MESSRS ", " ")
strAddress = Substitute(strAddress, " SIR ", " ")
strAddress = Substitute(strAddress, " OF ", " ")
strAddress = Substitute(strAddress, " DR ", " ")
strAddress = Substitute(strAddress, " OR ", " ")
strAddress = Substitute(strAddress, " IN ", " ")
strAddress = Substitute(strAddress, " THE ", " ")
strAddress = Substitute(strAddress, " REVEREND ", " REV ")
strAddress = Substitute(strAddress, " REVERENT ", " REV ")
strAddress = Substitute(strAddress, " HONOURABLE ", " HON ")
strAddress = Substitute(strAddress, " BROS ", " BROTHERS ")
strAddress = Substitute(strAddress, " ASSOC ", " ASSOCIATION ")
strAddress = Substitute(strAddress, " ASSN ", " ASSOCIATION ")

' Standardising 'St.', 'St', and 'Street'. Note that there are over 40 English
' towns and place names that contain or consist entirely of the word 'Street'.
' In addition, 'St' is a common abbreviation for 'Saint' in addresses.

' I have never seen a list of addresses where 'Street' and 'St' were used in a
' consistent way, and the only workable solution is to delete them all:


strAddress = Substitute(strAddress, " STREET ", " ")
strAddress = Substitute(strAddress, " ST ", " ")
strAddress = Substitute(strAddress, " STR ", " ")

Do While InStr(strAddress, "  ") > 0
    strAddress = Substitute(strAddress, "  ", " ")
Loop

strAddress = Trim(strAddress)

NormaliseAddress = strAddress

End Function


Public Function StripChars(myString As String, ParamArray Exceptions()) As String

' Strip out all non-alphanumeric characters from a string in a single pass
' Exceptions parameters allow you to retain specific characters (eg: spaces)

' THIS CODE IS IN THE PUBLIC DOMAIN


Application.Volatile False

Dim i As Integer
Dim iLen As Integer
Dim chrA As String * 1
Dim intA As Integer
Dim j As Integer
Dim iStart As Integer
Dim iEnd As Integer

If Not IsEmpty(Exceptions()) Then
    iStart = LBound(Exceptions)
    iEnd = UBound(Exceptions)
End If

iLen = Len(myString)

For i = 1 To iLen
    chrA = Mid(myString, i, 1)
    intA = Asc(chrA)
    Select Case intA
    Case 48 To 57, 65 To 90, 97 To 122
        StripChars = StripChars & chrA
    Case Else
        If Not IsEmpty(Exceptions()) Then
            For j = iStart To iEnd
                If chrA = Exceptions(j) Then
                    StripChars = StripChars & chrA
                    Exit For ' j
                End If
            Next j
        End If
    End Select
Next i



End Function


Private Function Substitute(ByVal Text As String, _
                            ByVal Old_Text As String, _
                            ByVal New_Text As String, _
                            Optional Instance As Long = 0, _
                            Optional Compare As VbCompareMethod = vbTextCompare _
                            ) As String


' Replace all instances (or the nth instance ) of 'Old' text with 'New'
' Unlike VB.Mid$ this method is not sensitive to length and can replace ALL instances
' This is not exposed as a Public function because there is an Excel Worksheet function
' called Substitute(). However, Workheet Functions have length constraints.

' THIS CODE IS IN THE PUBLIC DOMAIN


Dim iStart As Long
Dim iEnd As Long
Dim iLen As Long
Dim iInstance As Long
Dim strOut As String

iLen = Len(Old_Text)

If iLen = 0 Then
    Substitute = Text
    Exit Function
End If

iEnd = 0
iStart = 1

iEnd = InStr(iStart, Text, Old_Text, Compare)

If iEnd = 0 Then
    Substitute = Text
    Exit Function
End If


strOut = ""

Do Until iEnd = 0

    strOut = strOut & Mid$(Text, iStart, iEnd - iStart)
    iInstance = iInstance + 1

    If Instance = 0 Or Instance = iInstance Then
        strOut = strOut & New_Text
    Else
        strOut = strOut & Mid$(Text, iEnd, Len(Old_Text))
    End If

    iStart = iEnd + iLen
    iEnd = InStr(iStart, Text, Old_Text, Compare)

Loop

iLen = Len(Text)
strOut = strOut & Mid$(Text, iStart, iLen - iEnd)

Substitute = strOut

End Function



This is something I coded up on an obscure personal blog a couple of years ago: I get one or two enquiries about it every year (mostly appreciative) so I've tidied it up, tested it more thoroughly, and released it into the wild on Excellerando.

Saturday, 27 February 2010

String-Comparison in VBA: a modified Longest-Common-String approach

Summary:

I discuss the use of a sum-of-longest-common-strings algorithm to measure the degree of difference between strings. This is efficient in VBA, and can be used as a the basis as a closest-match alternative to VLookup and Match.


The Details...

A couple of years ago, I looked at writing a fuzzy-matching version of VLookup, to return the closest match to a search string rather than the #NA() that comes back from the standard Excel function. I posted it in an obscure personal blog, and forgot about it. But I'll be posting it here, shortly, and this post is the introduction to it, with an explanation of the principles and the core VBA function that drives the whole thing.

Originally,I looked at using a Levenshtein 'Edit Distance' algorithm to compare and measure the degree of difference between strings. (Thanks are due to the inestimable Mr. Crowley for pointing me towards some basic C++ and theory-of-algorithms links). But field-testing showed that a simpler and faster approach was required - I needed something that gives consistent results on longer strings, like addresses and sentences, without the need for a separate scoring process that examines the word order.

The simplest approach of all to comparing and scoring for similarity is searching for the longest common string. This has the obvious advantages of speed and simplicity, but it alse has a weak point: simple LCS algorithms are unduly sensitive to single-letter substitutions and deletions near the midpoint of the test word. For example, 'Wednesday' is obviously closer to 'WedXesday' on an edit-distance basis than it is to 'WednesXXX', but the latter has the longest common string despite having more substitutions; this suggests that it would be better to score the 'Wed' as well as the 'eday', adding up all the matching substrings, instead of just measuring the longest one.

It turns out that the recursive algorithm I'm using to do this has an embedded sequence-sensitivity; in theory, this is a complication and a pretty heavy hint that there's some error in my logic that I ought to investigate and remove. In practice, a degree of sequence-sensitivity works well when we compare two sentences or phrases: this 'error' is a pretty good proxy for compiling a secondary score based on similarities in their word order.

Which goes to show that serendipidity comes from simplicity and, if you strip out the comments, this function is a commendably compact piece of code:



Public Function SumOfCommonStrings( _
            ByVal s1 As String, _
            ByVal s2 As String, _
            Optional Compare As VBA.VbCompareMethod = vbTextCompare, _
            Optional iScore As Integer = 0 _
                ) As Integer

Application.Volatile False

' N.Heffernan 06 June 2006 (somewhere over Newfoundland)
' THIS CODE IS IN THE PUBLIC DOMAIN


' Function to measure how much of String 1 is made up of substrings found in String 2

' This function uses a modified Longest Common String algorithm.
' Simple LCS algorithms are unduly sensitive to single-letter
' deletions/changes near the midpoint of the test words, eg:
' Wednesday is obviously closer to WedXesday on an edit-distance
' basis than it is to WednesXXX. So would it be better to score
' the 'Wed' as well as the 'eday' ?

' Watch out for strings of differing lengths:
'
'    SumOfCommonStrings("Wednesday", "WednesXXXday")
'
' This scores the same as:
'
'     SumOfCommonStrings("Wednesday", "Wednesday")
'
' So make sure the calling function uses the length of the longest
' string when calculating the degree of similarity from this score.


' This is coded for clarity, not for performance.


Dim arr() As Integer     ' Scoring matrix
Dim n As Integer         ' length of s1
Dim m As Integer         ' length of s2
Dim i As Integer         ' start position in s1
Dim j As Integer         ' start position in s2
Dim subs1 As String     ' a substring of s1
Dim len1 As Integer     ' length of subs1

Dim sBefore1             ' documented in the code
Dim sBefore2
Dim sAfter1
Dim sAfter2

Dim s3 As String


SumOfCommonStrings = iScore

n = Len(s1)
m = Len(s2)

If s1 = s2 Then
    SumOfCommonStrings = n
    Exit Function
End If

If n = 0 Or m = 0 Then
    Exit Function
End If

's1 should always be the shorter of the two strings:

If n > m Then
    s3 = s2
    s2 = s1
    s1 = s3
    n = Len(s1)
    m = Len(s2)
End If

n = Len(s1)
m = Len(s2)

' Special case: s1 is n exact substring of s2
If InStr(1, s2, s1, Compare) Then
    SumOfCommonStrings = n
    Exit Function
End If

For len1 = n To 1 Step -1

    For i = 1 To n - len1 + 1

        subs1 = Mid(s1, i, len1)
        j = 0
        j = InStr(1, s2, subs1, Compare)
        
        If j > 0 Then
        
         ' We've found a matching substring...
            iScore = iScore + len1
        
        ' Reinstate this Debug.Print statement to monitor the function:
        ' Debug.Print s1 & " substring (len=" & len1 & ") " & subs1 & " in " & s2 & " Scores " & len1
            
        ' Now clip out this substring from s1 and s2...
    
        ' However, we can't just concatenate the fragments before and
        ' after this deletion and restart: substrings that span this
        ' artificial join might give spurious matches. So we run the
        ' function on the separate 'before' and 'after' pieces. Note
        ' that running before1 vs before2 and after1 vs after2, without
        ' running before1 vs after2 and before2 vs after1, introduces
        ' a sequence bias. This may be undesirable, as the effect will
        ' be to discard match scores for transposed words in a sentence

    
            If i > 1 And j > 1 Then
                sBefore1 = left(s1, i - 1)
                sBefore2 = left(s2, j - 1)
                iScore = SumOfCommonStrings(sBefore1, _
                                            sBefore2, _
                                            Compare, _
                                            iScore)
            End If
    
    
            If i + len1 < n And j + len1 < m Then
                sAfter1 = right(s1, n + 1 - i - len1)
                sAfter2 = right(s2, m + 1 - j - len1)
                iScore = SumOfCommonStrings(sAfter1, _
                                            sAfter2, _
                                            Compare, _
                                            iScore)
            End If
    
    
            SumOfCommonStrings = iScore
            ' No further recursion: don't double-count substrings of a matched substring!
            Exit Function
        
        'Reinstate this 'Else' block to monitor the function:
        'Else
            ' No action required.
            ' Debug.Print s1 & " substring (len=" & len1 & ") " & subs1 & " in " & s2

            
        End If

    Next


Next


End Function







There is room for improvement, and I suspect that the embedded sequence sensitivity is a drawback in some applications. Consider these two addresses:

    The House of Cards,
    11 High Street,

and

    House of Cards, The
    11 High Street

They are clearly the same address, and this isn't even a typing error: moving 'The' to the end of the line (or titles like 'Mr' and 'Mrs') is accepted secretarial practice in lists that are sorted alphabetically. But my algorithm treats the transposed word 'The' as an insertion, applying a penalty without making any attempt to identify it as a transposition. In fact, it double-counts the transposition as two points of difference - deletion plus insertion - just like the unmodified Levenshtein edit-distance algorithm. So feel free to rewrite my code where it splits the test words into 'before' and 'after' fragments and resumes the search for matching substrings - but be warned, this is not as simple as you might think, and I don't see any obvious analogy with Damerau's extension of the Levenshtein algorithm. In practice the brutal excision of articles and titles from addresses is the most reliable approach.


A parting shot:

I have a vague suspicion that this sum-of-longest-common-strings algorithm is functionally equivalent to the Levenshtein edit distance, but I lack the logical tools to attempt a formal proof. Would anyone care to offer some pointers? I think its time I moved beyond simple hacks and started putting this stuff on a firmer foundation.

Friday, 26 February 2010

Unprotecting a project using VBA

Ever tried to open another workbook call a macro in it from your VBA code?

Easy, if the sub or function is declared 'public' at workbook level and is visible as a method of the workbook object. If it isn't (and, sometimes, even if it is) and the VB Project is locked, you'll need to go into the VBE editor and unlock it yourself.

In short: manual intervention is required.

For obvious reasons, there's no Project.Unprotect(sPassword) function: obvious, but not good,and definitely not convenient when you've been asked to re-run all the reports in a month of separate daily workbooks.

We'll gloss over that your office should probably be handling the data and the daily reporting process in a more efficient way: sometimes you get this kind of job and the rest s up to you.

We'll assume that you know the password and have the right to open and run these files... now what?

There's code out there to unlock a project using a truly horrible combination of SendKeys() strings. THIS code is marginally better, but not miraculously so: it works on identifying the windows and the handles of the controls, and sending Windows messages using the API functions.

Most of the time the messages work... More often, anyway, than Sendkeys does. And, as we're in a slightly better environment than a keystroke-passer, we can read the results and retry the messages when they fail.

Here's the function:

fUnlockProject(wbk As Excel.Workbook, strPwd As String) As Boolean

I'm assuming that you know how to open a workbook in VBA: if you can't, then this code sample probably isn't for you. Not only is at an advanced topic - API calls and window messages - but we're doing something that VBA really isn't designed to do.

On top of that, Blogger's HTML editor (whatever RSS feed you're viewing the blog post in!) will have munged at least one of the line breaks and, while I've succeeded in getting thhe code below to copy-and-paste into a new VBA module, I suspect that some of you will get at least one syntax error when you try.

Finally: read the comments below the function header. There's stuff in there that you need to know about the return value, and a hint about passing the workbook object.



Option Explicit
Option Private Module


' Requires a reference to the library :
'   Microsoft Visual Basic for Applications Extensibility (v5.3)


    Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" ( _
          ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long _
       ) As Long
      
    Private Declare Function SendMessageStr Lib "user32.dll" Alias "SendMessageA" ( _
          ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As String _
       ) As Long
      
    Private Declare Function PostMessage Lib "user32.dll" Alias "PostMessageA" ( _
            ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long _
        ) As Long



    ' SetText params for SendMessage and PostMessage:

    ' ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.WIN32COM.v10.en/winui/winui/windowsuserinterface/windowing/windows/windowreference/windowmessages/wm_settext.htm
    '   wParam:         This parameter is not used.
    '   lParam:         Pointer to a null-terminated string that is the window text.
    '   Return Value:   The return value is TRUE if the text is set.
    
    
    ' API Window Message Constants are documented here:
    
    ' ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.WIN32COM.v10.en/winui/winui/windowsuserinterface/windowing/windows/windowreference/windowmessages/wm_close.htm
    ' ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.WIN32COM.v10.en/shellcc/platform/commctls/buttons/buttonreference/buttonmessages/bm_click.htm

    Private Declare Function GetWindowTextApi Lib "user32.dll" Alias "GetWindowTextA" (ByVal hWnd As Long, ByVal lpString As String, ByVal cch As Long) As Long
    Private Declare Function GetClassNameApi Lib "user32.dll" Alias "GetClassNameA" (ByVal hWnd As Long, ByVal lpClassName As String, ByVal nMaxCount As Long) As Long
    
    Private Const WM_SETTEXT As Long = &HC
    Private Const WM_CLOSE As Long = &H10
    Private Const BM_CLICK = &HF5
    
    Private Const SW_HIDE = 0
    Private Const BM_SETCHECK As Long = &HF1&
    Private Const BST_UNCHECKED = &H0&
    Private Const BST_CHECKED As Long = &H1&
    Private Const BST_INDETERMINATE = &H2&
    Private Const EM_REPLACESEL As Long = &HC2&
    Private Const HWND_TOPMOST As Long = -1
    Private Const SWP_NOACTIVATE As Long = &H10&
    Private Const SWP_NOMOVE As Long = &H2&
    Private Const SWP_NOSIZE As Long = &H1&
    Private Const SWP_SHOWWINDOW As Long = &H40&

    Private Const TCM_SETCURFOCUS As Long = &H1330&

    ' Default Dialog control IDs
    Private Const IDOK As Long = 1
    Private Const IDCANCEL As Long = 2

    
'    Private Declare Function ShowWindow Lib "user32" (ByVal hWnd As Long, ByVal nCmdShow As Long) As Long
        
    Private Declare Function SetFocus Lib "user32" (ByVal hWnd As Long) As Long

    Private Declare Function FindWindow Lib "user32.dll" Alias "FindWindowA" ( _
            ByVal lpClassName As String, ByVal lpWindowName As String _
        ) As Long

    ' ms-help:'MS.VSCC.v80/MS.MSDN.v80/MS.WIN32COM.v10.en/winui/winui/windowsuserinterface/windowing/dialogboxes/dialogboxreference/dialogboxfunctions/getdlgitem.htm
    ' Retrieves the handle to a control in the specified dialog box.
    ' hDlg      : [in] Handle to the dialog box that contains the control.
    ' nIDDlgItem: [in] Specifies the identifier of the control to be retrieved.
    ' returns   : The window handle of the specified control indicates success. NULL indicates failure due to an invalid dialog box handle or a nonexistent control.
    Private Declare Function GetDlgItem Lib "user32.dll" (ByVal hDlg As Long, ByVal nIDDlgItem As Long) As Long
    
    ' ms-help://MS.VSCC.v80/MS.MSDN.v80/MS.WIN32COM.v10.en/winui/winui/windowsuserinterface/windowing/windows/windowreference/windowfunctions/setforegroundwindow.htm
    ' If the window was brought to the foreground, the return value is nonzero.
    ' If the window was not brought to the foreground, the return value is zero.
    Private Declare Function SetForegroundWindow Lib "user32.dll" (ByVal hWnd As Long) As Long

        
    Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
    Private Declare Sub Sleep Lib "kernel32.dll" (ByVal dwMilliseconds As Long)


    Private Declare Function CharLower Lib "user32.dll" Alias "CharLowerA" (ByVal lpsz As String) As String
    Private Declare Function CharUpper Lib "user32.dll" Alias "CharUpperA" (ByVal lpsz As String) As String
        

' Password windows caption suffix
Private Const DLG_PWD_CAP_SUFFIX As String = " Password"

' Project properties dialog caption suffix
Private Const DLG_PRJPROP_CAP_SUFFIX As String = " - Project properties"

' Project properties dialog hWnd
Private hWndProjectProperties As Long


' Caption of the dialog when a bad password is inserted
Private Const DLG_BADPWD_CAP As String = "Project Locked"

' Caption of the generic VBA error
Public Const DLG_VBERROR_CAP As String = "Microsoft Visual Basic"

' Dialog class
Private Const DIALOG_CLS As String = "#32770"

' Password dialog textfield control ID
Private Const PWD_DLG_EDIT_ID As Long = &H155E&

' Wait time for the windows search
Private Const WAIT_TIME As Long = 500
'

Public Function fUnlockProject(wbk As Excel.Workbook, strPwd As String) As Boolean
On Error Resume Next

' Unlock a VB project using a known password.

' You are advised to pass a wbk parameter that's opened in another Excel Application session.
' This one will probably crash if you try it locally.

' Returns True if all the dialog boxes were closed (indicating that the app can be safely closed).
' To know if the document project was unlocked successfully, use the .VBProject.Protection property.

' This code works by manipulating the windows of the VBE password dialogue in VBA.
' It's a step above the widely-published 'SendKeys' code. But that's faint praise:
' it's messy, and you'll soon find out why I use all those 'GoTo ...iRetry' blocks.

    Dim appExcel As Excel.Application
    Dim vbpProject As VBIDE.VBProject
    Dim vbEditor As VBIDE.VBE
  
    Dim i As Long
    Dim lStart As Long

    
    Dim sPPDlgCaption As String     ' Project Properties dialog caption
    Dim hDlgProjectProps As Long    ' Project Properties dialog handle
    
    Dim sPwdDlgCaption As String    ' password dialog caption
    Dim hDlgPassword As Long        ' password dialog handle
    
    Dim hPwdField As Long           ' password dialog textbox handle
    Dim hDlgBadPassword As Long     ' a 'Bad Password' dialog handle
    
    Dim iRetry As Long
    
    
    ' Menu bar
    '  \ Tools (msoControlPopup, ID:30007)
    '     \ Properties of ... (msoControlButton, ID:2578)
    
    Dim cbMenuBar As CommandBar
    Dim cbpTools As CommandBarPopup
    Dim cbbProperties As CommandBarButton
    Dim bDialogsCleared As Boolean
        
    bDialogsCleared = True
    
    'Application.EnableCancelKey = xlDisabled

    
    Set appExcel = wbk.Application
    Set vbEditor = appExcel.VBE
    Set vbpProject = wbk.VBProject
    
    ' show Visual Basic Editor?
'    If appExcel.VBE.MainWindow.visible = True Then
'        appExcel.VBE.MainWindow.visible = False
'    End If
    
    ' set the VBE active project
    Set vbEditor.ActiveVBProject = vbpProject
        
    ' construct the password dialog caption
    sPwdDlgCaption = vbpProject.Name & DLG_PWD_CAP_SUFFIX
    
    ' construct the 'project properties' dialog caption
    sPPDlgCaption = vbpProject.Name & DLG_PRJPROP_CAP_SUFFIX
    
    

    ' Note that this could be structured as nested IF... THEN blocks, avoiding the use of 'GOTO'
    ' But 'drop-through or exit' is easier to follow when we use a 'go-back-and-retry' structure


' Try to acquire the menu bar
iRetry = 0
RetryGetMenuBar:
iRetry = iRetry + 1

    If Not fGetMenuBar(vbEditor, cbMenuBar) Then
    
        ' Failed, retry 3 times
        Call Sleep(32 * iRetry)
        
        If iRetry < 4 Then
            GoTo RetryGetMenuBar
        Else
            Debug.Print vbpProject.Name & vbTab & "fUnlockProject() - Menubar not found : " & Err.Description
            GoTo ExitFunction
        End If
        
    End If  ' menu bar successfully acquired
    
    
                
' try to find the 'Tools' menu
iRetry = 0
RetryGetToolsMenu:
iRetry = iRetry + 1
        
    Set cbpTools = cbMenuBar.FindControl(ID:="30007")
    
    If (cbpTools Is Nothing) Then
    
        ' Failed, retry 3 times
        Call Sleep(32 * iRetry)
        
        If iRetry < 4 Then
            GoTo RetryGetToolsMenu
        Else
            Debug.Print vbpProject.Name & vbTab & "fUnlockProject() - Tools menu not found : " & Err.Description
            GoTo ExitFunction
        End If
        
     End If
        
        
        
' try to get the 'project properties' menu item
iRetry = 0
RetryGetProjProps:
iRetry = iRetry + 1
            
            
    Call fGetPopupItem(cbpTools, "2578", cbbProperties)
    CloseNamedDialog DLG_VBERROR_CAP
    
    If (cbbProperties Is Nothing) Then
    
        'Failed, Retry 3 times
        Call Sleep(32 * iRetry)
        
        If iRetry < 4 Then
            GoTo RetryGetProjProps
        Else
            Debug.Print vbpProject.Name & vbTab & "fUnlockProject() : Properties menu item not found."
            GoTo ExitFunction
        End If
        
    End If
        

    ' Execute the 'project properties' menu item action
    Call cbbProperties.Execute

    ' Test an unlikely outcome: the project properties window
    ' opened up straightaway, indicating there was no password:
    hDlgProjectProps = 0
    hDlgProjectProps = FindWindow(DIALOG_CLS, sPPDlgCaption)
    
    If hDlgProjectProps <> 0 Then
        GoTo ExitFunction
    End If
    
    
      
' Get the password dialog's window handle:

        hDlgPassword = 0
        hDlgPassword = FindWindow(DIALOG_CLS, sPwdDlgCaption)

          
' Test the  password dialog exists, retry if it does not:
        
iRetry = 0
RetryGetPwdDialog:
iRetry = iRetry + 1

        If hDlgPassword = 0 And iRetry < 3 Then
        
            ' Close any 'bad password' or VB Error windows
            CloseNamedDialog DLG_VBERROR_CAP
            CloseNamedDialog DLG_BADPWD_CAP
            
            ' Try getting the hWnd of the password dialog again:
            hDlgPassword = FindWindow(DIALOG_CLS, sPwdDlgCaption)
            
            If hDlgPassword = 0 Then
                Call Sleep(32 * iRetry)
            End If
            
            If hDlgPassword = 0 Then
                GoTo RetryGetPwdDialog
            End If
            
        End If
        
        If hDlgPassword = 0 And iRetry < 4 Then
        
            CloseNamedDialog DLG_VBERROR_CAP
            CloseNamedDialog DLG_BADPWD_CAP
            
            ' Try reopening the dialog from the menu, then get the hwnd:
            Call cbbProperties.Execute
            
            Call Sleep(32 * iRetry)
            hDlgPassword = (FindWindow(DIALOG_CLS, sPwdDlgCaption))
            
            If hDlgPassword = 0 Then
                Call Sleep(32 * iRetry)
            End If
            
            If hDlgPassword = 0 Then
                GoTo RetryGetPwdDialog
            End If
            
        End If

        If hDlgPassword = 0 Then
            Debug.Print vbpProject.Name & vbTab & "fUnlockProject(): cannot open the password dialog."
            GoTo ExitFunction
        End If
        

        
' Get the password textbox

        hPwdField = 0
        hPwdField = GetDlgItem(hDlgPassword, PWD_DLG_EDIT_ID)
        
        
' Test the password textbox exists, retry if it does not:
          
iRetry = 0
RetryGetPwdTextbox:
iRetry = iRetry + 1
        
        If hPwdField = 0 And iRetry < 4 Then
        
            CloseNamedDialog DLG_VBERROR_CAP
            CloseNamedDialog DLG_BADPWD_CAP
            
            hPwdField = GetDlgItem(hDlgPassword, PWD_DLG_EDIT_ID)
            
            If hPwdField = 0 Then
                Call Sleep(32 * iRetry)
            End If
            
            If hPwdField = 0 Then
                GoTo RetryGetPwdTextbox
            End If
         End If
            
        If hPwdField = 0 Then
            Debug.Print vbpProject.Name & vbTab & "fUnlockProject(): cannot find the password textbox."
            bDialogsCleared = CloseWindow(hDlgPassword)
            GoTo ExitFunction
        End If
        
                
'Fill in the password text:
iRetry = 0
RetrySetText:
iRetry = iRetry + 1

        If SendMessageStr(hPwdField, WM_SETTEXT, 0&, strPwd) = 0 Then
        
            ' zero return indicates a failed set-text operation
            Call Sleep(32 * iRetry)
            
            Select Case iRetry
            Case Is < 4
                GoTo RetrySetText
            Case Is < 5
                CloseNamedDialog DLG_VBERROR_CAP
                CloseNamedDialog DLG_BADPWD_CAP
                hDlgPassword = (FindWindow(DIALOG_CLS, sPwdDlgCaption))
                hPwdField = GetDlgItem(hDlgPassword, PWD_DLG_EDIT_ID)
                GoTo RetrySetText
            Case Is < 6
                CloseNamedDialog DLG_VBERROR_CAP
                CloseNamedDialog DLG_BADPWD_CAP
                CloseWindow hDlgPassword
                GoTo RetryGetPwdDialog
            Case Else
                Debug.Print vbpProject.Name & vbTab & "fUnlockProject() : Unable to enter password '" & strPwd & "' into the textbox."
                bDialogsCleared = CloseWindow(hDlgPassword)
                GoTo ExitFunction
            End Select
        
        End If
        
        
' Click the 'Ok' button
iRetry = 0
RetryClickOK:
iRetry = iRetry + 1
      
      
        ' PostMessage returns the results of the 'click': nonzero indicates success
        If PostMessage(GetDlgItem(hDlgPassword, IDOK), BM_CLICK, 0&, 0&) = 0 Then
                    
            Select Case iRetry
            Case Is < 4
                Call Sleep(32 * iRetry)
                GoTo RetryClickOK
            Case 4
                CloseNamedDialog DLG_BADPWD_CAP
                CloseNamedDialog DLG_VBERROR_CAP
                Call SetForegroundWindow(hDlgPassword)
                GoTo RetryClickOK
            Case Else
                Debug.Print vbpProject.Name & vbTab & "fUnlockProject() : Unable to click 'OK' for this password."
                bDialogsCleared = CloseWindow(hDlgPassword)
                GoTo ExitFunction
            End Select
        
        End If ' fClickButton failed
        
                
        ' fClickButton returned true, telling us that control
        ' has returned to the OK button's parent dialog.
        
        ' However, that could also mean that the button wasn't clicked at all:
      
        hDlgPassword = 0
        hDlgPassword = FindWindow(DIALOG_CLS, sPwdDlgCaption)
        
        If hDlgPassword <> 0 Then
                            
            Select Case iRetry
            Case Is < 4
                Call Sleep(32 * iRetry)
                GoTo RetryClickOK
            Case 4
                CloseNamedDialog DLG_BADPWD_CAP
                CloseNamedDialog DLG_VBERROR_CAP
                Call SetForegroundWindow(hDlgPassword)
                GoTo RetryClickOK
            Case Else
                Debug.Print vbpProject.Name & vbTab & "fUnlockProject() : Unable to click 'OK' for this password."
                bDialogsCleared = CloseWindow(hDlgPassword)
                GoTo ExitFunction
            End Select
        
        End If
        
        bDialogsCleared = False
        

            

' Inspect the results of the click

' No retry block here: retrying Window-open operations, clicks and SetTexts is fine
' - or rather, a messy necessity - but the password itself either worked or failed.

' Two possible outcomes:   1 Password success opened a 'project properties' dialog
'                          2 Password failure opened a 'bad password' dialog


        
        
        If CloseNamedDialog(DLG_BADPWD_CAP) = 0 Then  ' no 'bad password' dialog to close
        
            hDlgProjectProps = 0
            hDlgProjectProps = FindWindow(DIALOG_CLS, sPPDlgCaption)
            
            If hDlgProjectProps = 0 Then
                Call Sleep(250)
            End If
            
            If hDlgProjectProps <> 0 Then
            
                ' Opened the 'Properties' screen, which means: PASSWORD SUCCESSFUL!
                Debug.Print "PASSWORD: " & strPwd & vbTab & wbk.FullName
                
                'Close the project properties dialog: try the OK button first
                bDialogsCleared = fClickButton(hDlgProjectProps, IDOK)
                
            End If 'successful password
        
        Else
            ' Bad password dialog detected & closed... Our password Failed
            
        End If
    

ExitFunction:

    CloseNamedDialog DLG_BADPWD_CAP
    CloseNamedDialog DLG_VBERROR_CAP
    hDlgProjectProps = FindWindow(DIALOG_CLS, sPPDlgCaption)
    bDialogsCleared = bDialogsCleared And CloseWindow(hDlgPassword) And CloseWindow(hDlgProjectProps)
  
    
    If (bDialogsCleared) Then
        ' all the dialog boxes were closed
        fUnlockProject = True
    End If
    
    Set cbbProperties = Nothing
    Set cbpTools = Nothing
    Set cbMenuBar = Nothing
    
    vbEditor.MainWindow.Close
    
    Set vbEditor = Nothing
    Set appExcel = Nothing
    
End Function


Private Function fGetDialogHnd(sCaption, hDlg As Long) As Boolean

' Get the handle of the dialog whose the caption is specified.
' Return True if the dialog was found.

    hDlg = (FindWindow(DIALOG_CLS, sCaption))
    fGetDialogHnd = (hDlg <> 0)
    
End Function



Private Function fClickButton(hDlg As Long, lButtonID As Long) As Boolean

' Programmatically click on a button in a command bar or menu,  specified by ID
' Return False if the button owner was not activated or the 'click' failed

    Dim hButton As Long
    
    ' get the button handle
    hButton = GetDlgItem(hDlg, lButtonID)
    
    ' active the dialog box (hDlg) and click on the button
    If PostMessage(hButton, BM_CLICK, 0&, 0&) <> 0 Then
        fClickButton = True
    End If
    
End Function



Private Function fGetMenuBar(oContainer As Object, cb As CommandBar) As Boolean

' Get the menu bar of the specified container:
' oContainer can be any object which has a CommandBars collection.
' Return True if the menu bar was found.

    Dim i As Long
    
    On Error Resume Next
    For i = 1 To oContainer.CommandBars.Count
        Set cb = oContainer.CommandBars(i)
        
        If (cb.Type = msoBarTypeMenuBar) Then
            fGetMenuBar = True
            Exit For
        End If
    Next i
    On Error GoTo 0
    
End Function



Private Function fGetPopupItem(cbp As CommandBarPopup, sControlID As String, cbc As CommandBarControl) As Boolean
' Get a control from a commandbar or menu, by specifying the control's ID

    Dim i As Long
    
    For i = 1 To cbp.Controls.Count
    
        Set cbc = cbp.Controls(i)
        
        If (cbc.ID = sControlID) Then
            fGetPopupItem = True
            Exit For
        End If
        
    Next i
    
    
End Function


Private Function TrimNulls(ByVal sString As String) As String
' Trims trailing nulls

Dim iPos As Integer

iPos = InStr(sString, Chr$(0))
    
    Select Case iPos
    Case 0
    
        TrimNulls = sString
        
    Case 1

        TrimNulls = ""
        
    Case Else ' iPos > 1
    
        TrimNulls = left$(sString, iPos - 1)
        
    End Select
    
End Function

Private Function fUCase(ByVal sString As String) As String

    If (Len(sString) >= 2) Then
        fUCase = CharUpper(left$(sString, 1)) & _
                 CharLower(right$(sString, Len(sString) - 1))
    Else
        fUCase = sString
    End If
    
End Function



Private Function IsArrayEmpty(va As Variant) As Boolean
' Incorporates fix from Torsten Rendelmann (MVPS - Hardcore VB)
    Dim i As Long
    
    On Error Resume Next
    i = LBound(va, 1)
    IsArrayEmpty = (Err.Number <> 0)
    On Error GoTo 0 ' Err.Clear
    
End Function


Private Function CloseWindow(hWnd As Long) As Boolean

Dim iRetry As Integer

CloseWindow = False

RetryCloseWindow:
iRetry = iRetry + 1

    If SendMessage(hWnd, WM_CLOSE, 0&, 0&) = 0& Then
    
        CloseWindow = True
        
    Else
    
        CloseWindow = False
        CloseNamedDialog DLG_VBERROR_CAP
        Call Sleep(32 * iRetry)
        
        If iRetry < 4 Then
            GoTo RetryCloseWindow
        End If
        
    End If

End Function

Public Function CloseNamedDialog(sDialogCaption As String) As Long
'Returns window handle of last-closed window
On Error Resume Next

Dim iCount As Integer
Dim hwnDialog As Long

Err.Clear

CloseNamedDialog = 1

hwnDialog = FindWindow(DIALOG_CLS, sDialogCaption)

Do Until hwnDialog = 0
iCount = iCount + 1

    SendMessage hwnDialog, WM_CLOSE, 0&, 0&
    CloseNamedDialog = hwnDialog
    hwnDialog = FindWindow(DIALOG_CLS, sDialogCaption)
    
    
    If iCount > 1 Then
        Sleep 10 * iCount
        SetFocus hwnDialog
    End If
    
    If iCount > 3 Then
        ' something's stopping us closing the window
        Exit Do
    End If
    
Loop

End Function

Public Sub CloseGenericError()
On Error Resume Next

    CloseNamedDialog DLG_VBERROR_CAP
    Application.OnTime EarliestTime:=Now() + (1# / 24# / 1200#), Procedure:="CloseGenericError"

End Sub

Private Function ClickButton(hWndOwner As Long, hWndButton As Long) As Boolean
On Error Resume Next

    SetForegroundWindow hWndOwner
    SetFocus hWndButton
    PostMessage hWndButton, BM_CLICK, 0&, 0&

End Function

Private Function GetWindowText(ByVal hWnd As Long) As String

Dim sBuffer As String
Dim lBufferLen As Long
    
    sBuffer = String$(512, 0)
    lBufferLen = GetWindowTextApi(hWnd, sBuffer, Len(sBuffer))
    GetWindowText = left$(sBuffer, lBufferLen)
    
End Function

Private Function GetClassName(ByVal hWnd As Long) As String

Dim sBuffer As String
Dim lBufferLen As Long
    
    sBuffer = String$(512, 0)
    lBufferLen = GetClassNameApi(hWnd, sBuffer, Len(sBuffer))
    GetClassName = left$(sBuffer, lBufferLen)
    
End Function