Sunday 19 August 2012

Join and Split functions for 2-Dimensional arrays

Here's something I fished out of the attic and posted into StackOverflow...

The code's trivial, in the sense that anyone can do a bit of string-concatenation and a Redim() statement and you've probably done some kind of 'Join' and 'Split' already. But there are a couple of points about efficient string-handling in the comments; or rather, overcoming the inefficiencies of a language which has no string-builder class.

Someday, you are going to find out that concatenating strings slows down *severely* for long strings, and you'll need to know how to work around that.

So, without further ado:

Join2d: A 2-Dimensional Join function in VBA with optimised string-handling

Coding notes:
  1. This 'Join' function does not suffer from the 255-char limitation that affects most (if not all) of the native Concatenate functions in Excel, and the Range.Value code sample above will pass in the data, in full, from cells containing longer strings.
  2. This is heavily optimised: we use string-concatenation as little as possible, as the native VBA string-concatenations are slow and get progressively slower as a longer string is concatenated.
If you want to look more deeply into optimising string-handling in VBA and the VB family of languages, advanced techniques are listed in parts I, II and II of this web article: http://www.aivosto.com/vbtips/stringopt3.html

The biggest performance gain available in native VBA is to avoid allocation and concatenation ( here's why: http://www.aivosto.com/vbtips/stringopt2.html#huge ) - so I use join, split, and replace instead of myString = MyString & MoreString

Bigger gains are available if you use the Kernel string functions directly: after that, you're Googling for LightningStrings and taking the big step into pointer arithmentic... Which I consider a step too far: if you need that kind of performance, you need another platform.



Public Function Join2d(ByRef InputArray As Variant, _ 
                       Optional RowDelimiter As String = vbCr, _ 
                       Optional FieldDelimiter = vbTab,_ 
                       Optional SkipBlankRows As Boolean = False _ 
                       ) As String

' Join up a 2-dimensional array into a string. Works like the standard
'  VBA.Strings.Join, for a 2-dimensional array.
' Note that the default delimiters are those inserted into the string
'  returned by ADODB.Recordset.GetString

On Error Resume Next

' Coding note: we're not doing any string-handling in VBA.Strings - 
' allocating, deallocating and (especially!) concatenating are SLOW.
' We're using the VBA Join & Split functions ONLY. The VBA Join,
' Split, & Replace functions are linked directly to fast (by VBA
' standards) functions in the native Windows code. Feel free to 
' optimise further by declaring and using the Kernel string functions
' if you want to.

' ** THIS CODE IS IN THE PUBLIC DOMAIN **
'   Nigel Heffernan   Excellerando.Blogspot.com

Dim i As Long
Dim j As Long

Dim i_lBound As Long
Dim i_uBound As Long
Dim j_lBound As Long
Dim j_uBound As Long

Dim arrTemp1() As String
Dim arrTemp2() As String

Dim strBlankRow As String

i_lBound = LBound(InputArray, 1)
i_uBound = UBound(InputArray, 1)

j_lBound = LBound(InputArray, 2)
j_uBound = UBound(InputArray, 2)

ReDim arrTemp1(i_lBound To i_uBound)
ReDim arrTemp2(j_lBound To j_uBound)

For i = i_lBound To i_uBound

    For j = j_lBound To j_uBound
        arrTemp2(j) = InputArray(i, j)
    Next j

    arrTemp1(i) = Join(arrTemp2, FieldDelimiter)

Next i

If SkipBlankRows Then

    If Len(FieldDelimiter) = 1 Then
        strBlankRow = String(j_uBound - j_lBound, FieldDelimiter)
    Else
        For j = j_lBound To j_uBound
            strBlankRow = strBlankRow & FieldDelimiter
        Next j
    End If

    Join2d = Replace(Join(arrTemp1, RowDelimiter), strBlankRow, RowDelimiter, "")
    i = Len(strBlankRow & RowDelimiter)

    If Left(Join2d, i) = strBlankRow & RowDelimiter Then
        Mid$(Join2d, 1, i) = ""
    End If

Else

    Join2d = Join(arrTemp1, RowDelimiter)    

End If

Erase arrTemp1

End Function
For completeness, here's the corresponding 2-D Split function:

Split2d: A 2-Dimensional Split function in VBA with optimised string-handling



Public Function Split2d(ByRef strInput As String, _ 
                        Optional RowDelimiter As String = vbCr, _ 
                        Optional FieldDelimiter = vbTab, _ 
                        Optional CoerceLowerBound As Long = 0 _ 
                        ) As Variant

' Split up a string into a 2-dimensional array. 

' Works like VBA.Strings.Split, for a 2-dimensional array.
' Check your lower bounds on return: never assume that any array in
' VBA is zero-based, even if you've set Option Base 0
' If in doubt, coerce the lower bounds to 0 or 1 by setting 
' CoerceLowerBound
' Note that the default delimiters are those inserted into the
'  string returned by ADODB.Recordset.GetString

On Error Resume Next

' Coding note: we're not doing any string-handling in VBA.Strings -
' allocating, deallocating and (especially!) concatenating are SLOW.
' We're using the VBA Join & Split functions ONLY. The VBA Join,
' Split, & Replace functions are linked directly to fast (by VBA
' standards) functions in the native Windows code. Feel free to 
' optimise further by declaring and using the Kernel string functions
' if you want to.

' ** THIS CODE IS IN THE PUBLIC DOMAIN **
'    Nigel Heffernan   Excellerando.Blogspot.com

Dim i   As Long
Dim j   As Long

Dim i_n As Long
Dim j_n As Long

Dim i_lBound As Long
Dim i_uBound As Long
Dim j_lBound As Long
Dim j_uBound As Long

Dim arrTemp1 As Variant
Dim arrTemp2 As Variant

arrTemp1 = Split(strInput, RowDelimiter)

i_lBound = LBound(arrTemp1)
i_uBound = UBound(arrTemp1)

If VBA.LenB(arrTemp1(i_uBound)) <= 0 Then  
    ' clip out empty last row: a common artifact in data 
     'loaded from files with a terminating row delimiter
    i_uBound = i_uBound - 1
End If

i = i_lBound
arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

j_lBound = LBound(arrTemp2)
j_uBound = UBound(arrTemp2)

If VBA.LenB(arrTemp2(j_uBound)) <= 0 Then 
 ' ! potential error: first row with an empty last field...
    j_uBound = j_uBound - 1
End If

i_n = CoerceLowerBound - i_lBound
j_n = CoerceLowerBound - j_lBound

ReDim arrData(i_lBound + i_n To i_uBound + i_n, j_lBound + j_n To j_uBound + j_n)

' As we've got the first row already... populate it
' here, and start the main loop from lbound+1

For j = j_lBound To j_uBound
    arrData(i_lBound + i_n, j + j_n) = arrTemp2(j)
Next j

For i = i_lBound + 1 To i_uBound Step 1

    arrTemp2 = Split(arrTemp1(i), FieldDelimiter)

    For j = j_lBound To j_uBound Step 1

        arrData(i + i_n, j + j_n) = arrTemp2(j)

    Next j

    Erase arrTemp2

Next i

Erase arrTemp1

Application.StatusBar = False

Split2d = arrData

End Function
Share and enjoy... And watch out for unwanted line breaks in the code, inserted by your browser (or by Blogger's helpful formatting functions)

No comments:

Post a Comment