15

Is a string an array in VBA?

For instance, can I iterate through it like I could in C/C++ ?

Do something like this:

char myArray[10]; 

for (int i = 0; i < length; i++)
{
    cout << myArray[i];
}

What would the equivalent be in in VBA? It is not behaving as I would expect. Never actually tried to manipulate many strings before in VBA! :)

John Saunders
  • 160,644
  • 26
  • 247
  • 397
timbram
  • 1,797
  • 5
  • 28
  • 49
  • 1
    If you really want you can assign a string to a Byte array and loop through that, but then you have to convert the bytes back to characters. – Rory May 12 '15 at 14:55
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders May 12 '15 at 16:35
  • 2
    See the detailed answer below. A string is a byte array, and vice versa. Not 'a variable which can be converted into a byte array' - it *is* a byte array, and you can manipulate it as such. – Nigel Heffernan May 12 '15 at 18:21
  • 2
    Also: I hope that I can avoid being unduly blunt about this, but some of these answers are mistaken, so much so that they can only be contradicted rather than improved upon. Further (and with apologies for the repetition): a string is an array of bytes, the data structures are identical - not merely equivalent! - and VBA recognises that *because it has no mechanism to do otherwise* - so I regret to say that all mentions of 'conversion' fall into the category of "not even wrong". – Nigel Heffernan May 12 '15 at 18:36
  • 1
    @Nile if a string *is* a byte array and vice versa why does `UBound(s)` fail on string variables `s` and `Len(b)` fail on `byte()` variables `b`? Obviously the underlying representation of a string is an array of bytes (what else could it be -- a linked list?) and your answer shows how this can be exploited, but the VBA compiler itself makes semantic distinctions between strings and byte arrays, so it isn't clear why you should in effect scold others for making such distinctions. – John Coleman Jan 01 '17 at 05:36
  • @john-coleman The distinction is semantic but not functional: the pragmatic view is that some (but not all) operations either ignore the semantics and accept the type pun, or perform a 'cast' silently; and the explicit cast appears to be implemented with no memory allocation operation. Beyond that, the scolding - or emphatic repetition - is all about reading repeated errors made through ignorance of a very simple point: sometimes you do need to know the implementation details, and this is a question that can only be answered by explaining that strings in VBA are implemented as a byte array. – Nigel Heffernan Jan 03 '17 at 16:18
  • @Nile I tend to be somewhat agnostic about underlying implementation details of closed-source applications. This article suggests that there are significant differences in the underlying memory structures between a string and a byte array: http://bytecomb.com/vba-internals-whats-in-a-variable/ . Both involve a contiguous sequence of bytes which contain the data itself and it would be a nice optimization if the string to byte array conversion is done without copying those bytes. A careful timing test should be able to tell if the conversion is O(n) (which would imply copying) or O(1). – John Coleman Jan 03 '17 at 17:09

6 Answers6

15

A string is an array of bytes, and you can iterate through it if you know your way around two-byte implementations of a string (or Unicode, which is usually 2 bytes per character, but can be more: and it's always going to be 2-Byte if your string originated in VBA).

When I say is, I mean it: there is no need for a type conversion, and this will work just fine:

Public Sub TestByteString()

    Dim strChars As String
    Dim arrBytes() As Byte
    Dim i As Integer


    strChars = "The quick Brown Fox"
    arrBytes = strChars

    Debug.Print strChars
    Debug.Print

    For i = LBound(arrBytes) To UBound(arrBytes) Step 2 

        Debug.Print Chr(arrBytes(i)) & vbTab & "Byte " & i & " = " & arrBytes(i)

    Next i

    arrBytes(0) = Asc("?")
    arrBytes(2) = Asc("!")
    arrBytes(4) = Asc("*")

    strChars = arrBytes

    Debug.Print
    Debug.Print strChars

    Erase arrBytes

End Sub

Your outputs will look like this:

The quick Brown Fox

T Byte 0 = 84 h Byte 2 = 104 e Byte 4 = 101 Byte 6 = 32 q Byte 8 = 113 u Byte 10 = 117 i Byte 12 = 105 c Byte 14 = 99 k Byte 16 = 107 Byte 18 = 32 B Byte 20 = 66 r Byte 22 = 114 o Byte 24 = 111 w Byte 26 = 119 n Byte 28 = 110 Byte 30 = 32 F Byte 32 = 70 o Byte 34 = 111 x Byte 36 = 120

?!* quick Brown Fox

Note the 'Step 2' in the loop: I'm discarding every other byte, because I know that it's plain-vanilla Latin characters - 'ASCII' text to the uninitiated.

It gets interesting when you have to deal with Arabic and Pinyin text: and you should never assume in a real-world worksheet that you're always going to be dealing with plain-vanilla US ASCII, as I did in that demonstration piece.

For a more comprehensive example, with more detailed explanation, try this from Excellerando:

Writing an Excel range to a csv file: optimisations and unicode compatibility

The Byte-array optimisation is towards the bottom, under this heading:

A VBA implementation of the Adler-32 checksum, running on byte arrays instead of using VBA string-handling.

The underlying nature of a string does not seem to be as widely-known as it should be: it's not a thing that you will use often in your code, but a lot of the problems with Unicode and non-Latin alphabets that people have get easier when they have a deeper understanding of the variables in their code.

Nigel Heffernan
  • 4,636
  • 37
  • 41
14

It is not an array but what you want can be done using MID

Sub test()
    Dim strSentence As String
    Dim lngCount As Long    

    strSentence = "This is a string"

    For lngCount = 1 To Len(strSentence)
        Debug.Print Mid(strSentence, lngCount, 1)
    Next lngCount             
End Sub

See also Mat's Mug answer for an additional good point.

Edit

Actually there is a second way to iterate through a string, which is to convert the string to an array of single-digit strings and then iterate through them. To this end, we fist have to convert the original string to unicode format so that the Null character can be taken as delimiter. Here is a reproducible example:

Sub test2()
    Dim strSentence As String
    Dim lngCount As Long
    Dim strArray() As String

    strSentence = "This is a string"
    strSentence = StrConv(strSentence, vbUnicode)

    strArray = Split(strSentence, vbNullChar)

    For lngCount = 0 To UBound(strArray)
        Debug.Print strArray(lngCount)
    Next lngCount
End Sub

Out of curiosity, I compared the two approaches (using Mat's version of Mid, which is faster):

Sub test_sub()
    Dim strTest(1 To 5000) As String
    Dim lngStringIter As Long
    Dim lngChars As Long, dblTick As Double

    ' Generate  some long strings first
    For lngStringIter = 1 To 5000
        strTest(lngStringIter) = vbNullChar
        For lngChars = 1 To 10
            strTest(lngStringIter) = strTest(lngStringIter) & _
                Chr(Int((90 - 65 + 1) * Rnd + 65)) & strTest(lngStringIter)
        Next lngChars
    Next lngStringIter

    ' Lets see what happens..
    dblTick = CDbl(Now())
    For lngStringIter = 1 To 5000
      test strTest(lngStringIter)
    Next lngStringIter
    Debug.Print "Time Mid: ", CDbl(Now()) - dblTick

    dblTick = CDbl(Now())
    For lngStringIter = 1 To 5000
      test2 strTest(lngStringIter)
    Next lngStringIter
    Debug.Print "Time Split: ", CDbl(Now()) - dblTick
End Sub

Results:

Time Mid:     4.62962998426519e-05 
Time Split:    1.15740767796524e-05 

So it seems that the Split approach is somewhat faster.


A VBA string is implemented as a BSTR datatype. More info for this data type can be found here and here.

Ioannis
  • 5,238
  • 2
  • 19
  • 31
  • @timbram this is what you want to do – mgrenier May 12 '15 at 14:51
  • Perfect I'll have a look! I dont like `Now()` either, and usually use the `dll` time function as `Private Declare Function GetTickCount Lib "kernel32.dll" () As Long`, but I am on a Mac at the moment. Thanks again for the link. – Ioannis May 12 '15 at 16:45
  • 1
    I put the unit tests on [PasteBin](http://pastebin.com/ew0bUu6M) - as you see I've extracted most of the setup code into `ModuleInitialize` so that the tests only test the relevant parts. `TestMid` runs in 17-22ms, and `TestSplit` runs in 7-10ms (did a few runs), so yeah, clear winner :) – Mathieu Guindon May 12 '15 at 17:00
  • I guess so, but will check first thing I get on a PC (I do PC-based VBA but need the Mac for other stuff). Unit testing support is poor on VBA, and I quite like the other features as well (so far I am using MZ Tools) – Ioannis May 12 '15 at 17:02
  • 2
    I would recommend a Google search on 'String optimisation in VBA'. Any function which allocates and reallocates string variables will be slow: Splitting to an array is an effective optimisation, as the one (horribly inefficient) allocation is over and done with, and you're not doing any more of that inside the loop; but there are much, much better ways of doing this. Start with this article, and keep reading: http://www.aivosto.com/vbtips/stringopt.html – Nigel Heffernan May 12 '15 at 18:18
  • 1
    @Nile, Many thanks for the link, it is a good read, and the performance comparisons very useful. I have used ByteArrays in the past, but not extensively. I agree that using Byte Arrays is more efficient than Splitting or MID-ing. I have +1-ed your code, in which the only thing I would change is the [`Integer` type to `Long`](http://msdn.microsoft.com/en-us/library/office/aa164506%28v=office.10%29.aspx), although in most cases the difference is abysmally small. – Ioannis May 12 '15 at 19:32
13

Mid works to get the nth character in any string, but returns a Variant, which incurs implicit conversion to String.

Use the "strongly-typed" version: Mid$ instead.

Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
5

I don't think VBA lets you treat a string as an array without converting it.

However, you can use MID to get one character at a time.

Code is OTTOMH

declare i as integer
declare strlen as int

strlen = Len (YourIncomingString)

for i = 0 to strlen
    print Mid (YourIncomingString, i, 1)
Raj More
  • 47,048
  • 33
  • 131
  • 198
4

Simple answer: no, string is not an object in VBA. You would have to pick it apart on character at the time using the mid and len functions.

Tom
  • 747
  • 5
  • 16
1

It is still possible to create your own custom enumerable string.

Enumerable string can be used like this:

Dim es As EnumerableString
Set es = New EnumerableString
es.Value = ActiveCell.Value
Dim ch As Variant
For Each ch In es
    ActiveCell.Offset(1, 0).Activate
    ActiveCell.Value = ch
Next ch

Save this code in file with name EnumerableString.cls a import it in VBA Project.

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "EnumerableString"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit

Private m_characters As Collection

Public Property Get NewEnum() As IUnknown
Attribute NewEnum.VB_UserMemId = -4
Attribute NewEnum.VB_MemberFlags = "40"
    Set NewEnum = m_characters.[_NewEnum]
End Property

Public Property Let Value(ByVal newValue As String)
    Dim pos As Integer
    Set m_characters = New Collection
    For pos = 1 To Len(newValue)
        m_characters.Add Mid$(newValue, pos, 1)
    Next pos
End Property

Public Function Length() As Long
    Length = m_characters.Count
End Function

Public Function Item(ByVal index As Variant) As String
Attribute Item.VB_UserMemId = 0
    Item = m_characters(index)
End Function

Private Sub Class_Initialize()
    Set m_characters = New Collection
End Sub

Result in Excel, example with ASCII, ANSI, and Unicode UTF-8 characters:

enter image description here

Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
  • 1
    In light of the existing byte-array answer this seems a bit of an overkill and inefficient solution. I'd upvote if the involved attributes were explained, and if there was a getter for the `Value` property (write-only properties are a design smell imo). – Mathieu Guindon Jan 01 '17 at 16:48
  • 1
    @Mat'sMug the 'light' of existing byte-array answers is maybe not as light as it seems to be because it works only for ASCII or ANSI characters. But in Unicode there are not only two byte encoded code points but e.g. three byte encoded ones etc. My answer is just an idea how to make an enumerable string. – Daniel Dušek Jan 01 '17 at 19:18