2

I wanted to count chars in the Word document by Macro I have no idea how to get reference two the text in visual basic macro and go through it.

I would like to count how many of every char was in the document. For example in document:

ABZBB

A x 1
B x 3
Z x 1

   Sub Macro1()
Dim Box As Shape
Set Box = ActiveDocument.Shapes.AddTextbox( _
Orientation:=msoTextOrientationHorizontal, _
Left:=50, Top:=50, Width:=200, Height:=400)
Box.TextFrame.TextRange.Text = "My text comes this way" + Chr(10)
Dim s As String
Application.ScreenUpdating = False
docLength = ActiveDocument.Range.Characters.Count

Box.TextFrame.TextRange.Text = Box.TextFrame.TextRange.Text + "Text length is: " + Str(docLength) + Chr(10)

Dim arr(128) As Integer
Dim character As Integer
For i = 1 To docLength - 1

        character = Asc(ActiveDocument.Range.Characters(i))
If iAsc >= 0 And iAsc <= 127 Then
         arr(character) = arr(character) + 1
 End If
Next i


End Sub
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Yoda
  • 17,363
  • 67
  • 204
  • 344
  • 1
    vba or c# which your code seems to be?? – Kazimierz Jawor Jun 15 '13 at 18:38
  • Try this: `Box.TextFrame.TextRange.Text = "My text comes this way" & Chr(10) & ActiveDocument.Range.Characters.Count` – Doug Glancy Jun 15 '13 at 18:53
  • if you have any efficiency problem with the solution you accepted/implemented you could use different solution based on `Find object`, quite easy to implement when using function. I could have provided it but I was too late to catch with this discussion. – Kazimierz Jawor Jun 15 '13 at 19:47
  • FWIW I had an idea for a (possibly) efficient solution. Duplicate the range, create a Dictionary, read the count of the number of characters in the document. Repeatedly use Replace on the duplicated range, each time counting the remaining characters, and creating the dictionary entry. Repeat until the characters are exhausted. The dictionary wouldn't be sorted but we could iterate it using the ascii character codes. – Andy G Jun 15 '13 at 20:32
  • @KazJaw, I added a method that uses `Find` and two others to my answer, if you are interested. – Doug Glancy Jun 16 '13 at 03:52

2 Answers2

4

Using VBA, to count the number of characters in the active document do:

ActiveDocument.Range.ComputeStatistics(wdStatisticCharacters)

or

Activedocument.Range.Characters.Count

To get the count for the current selection:

Selection.Range.ComputeStatistics(wdStatisticCharacters)

or

Selection.Range.Characters.Count

The second method in each example counts spaces as characters, the first doesn't.

EDIT: I did some speed testing on various methods to count the instances of a char in a document. Regular expressions and stuffing the document contents into a string are fastest - many times faster than looping through each character or FIND

For my test document I copied the contents of this web page into a Word document. As an accuracy check, I used Word's Find function/panel to find the number of instances of lower case "a". Before I edited this answer that was 409 instances.

I then created four functions to count the number of instances of a character (any string actually) in a Word document. The first simply loops through each character in the doc, similar to Andrew's. The second uses the Find function. The third stuffs the contents of the document into a string and loops through it. The fourth does the same thing but check the matches using a regular expression:

Function GetCharCountLoop(doc As Word.Document, char As String) As Long
Dim i As Long
Dim CharCount As Long

With doc.Content.Characters
    For i = 1 To .Count
        If .Item(i) = char Then
            CharCount = CharCount + 1
        End If
    Next i
End With
GetCharCountLoop = CharCount
End Function

Function GetCharCountFind(doc As Word.Document, char As String) As Long
Dim i As Long
Dim CharCount As Long

With doc.Content.Find
    Do While .Execute(FindText:=char, Forward:=True, MatchWholeWord:=False, MatchCase:=True) = True
        CharCount = CharCount + 1
    Loop
    GetCharCountFind = CharCount
End With
End Function

Function GetCharCountString(doc As Word.Document, char As String) As Long
Dim chars As String
Dim i As Long
Dim CharCount As Long

chars = doc.Content
For i = 1 To Len(chars)
    If Mid$(chars, i, 1) = char Then
            CharCount = CharCount + 1
        End If
    Next i
GetCharCountString = CharCount
End Function

Function GetCharCountRegex(doc As Word.Document, char As String) As Long
Dim chars As String
Dim CharCount As Long
Dim objRegExp As Object

chars = doc.Content
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
    .Pattern = char
    .IgnoreCase = False
    .Global = True
    CharCount = .Execute(chars).Count
End With
GetCharCountRegex = CharCount
End Function

I then tested them using this sub, running a single loop:

Sub TimeMethods()
Dim char As String
Dim CharCount As Long
Dim LoopCounter As Long
Dim NumLoops As Long
Dim StartTime As Double

char = "a"
NumLoops = 1

StartTime = Timer
For LoopCounter = 1 To NumLoops
CharCount = GetCharCountLoop(ActiveDocument, char)
Next LoopCounter
Debug.Print CharCount
Debug.Print Timer - StartTime

StartTime = Timer
For LoopCounter = 1 To NumLoops
CharCount = GetCharCountFind(ActiveDocument, char)
Next LoopCounter
Debug.Print CharCount
Debug.Print Timer - StartTime

StartTime = Timer
For LoopCounter = 1 To NumLoops
CharCount = GetCharCountString(ActiveDocument, char)
Next LoopCounter
Debug.Print CharCount
Debug.Print Timer - StartTime

StartTime = Timer
For LoopCounter = 1 To NumLoops
CharCount = GetCharCountRegex(ActiveDocument, char)
Next LoopCounter
Debug.Print CharCount
Debug.Print Timer - StartTime

End Sub

The results are dramatic:

GetCharCountLoop - 514.3046875 seconds

GetCharCountFind - 0.5859375 seconds

GetCharCountString - 0.015625 seconds

GetCharCountRegex - 0.015625 seconds

I dropped GetCharCountLoop from the running and ran the other three 100 times. According to this rudimentary timing, stuffing the contents into a string and counting, or using a regular expression, are almost 50 times faster than the Find method:

GetCharCountFind - 30.984375 seconds

GetCharCountString - 0.6328125 seconds

GetCharCountRegex - 0.578125 seconds

Note that the slowness of the first method, looping through each character is most evident with longer docs. In my initial testing - a file with just a few words - it was only twice as slow as the Find method.

Also note that I originally turned off ScreenUpdating per Andrew's subroutine, but it seems that makes no difference.

Doug Glancy
  • 27,214
  • 6
  • 67
  • 115
  • `Activedocument.Range.Characters.Count` Is it returning array or what kind of object? I am asking cause later I would like to insert results in the text box like: A x 4, B x 3,C x 1 – Yoda Jun 15 '13 at 18:46
  • Range.Characters returns a collection of characters. – Doug Glancy Jun 15 '13 at 18:47
  • You have to do something with the returned value, like assign it to a variable or `Debug.Print` it. – Doug Glancy Jun 15 '13 at 18:55
  • @DougGlancy, excellent edition, especially your effort to compare all method, +1 from me (would be +3 if I could). Anyway, the only limitation of `regex` & `string` method I see is text length, if it's over 65000 characters it would rather not run due to `String variable limitation`. I know, we could split it into parts and run with other loop which (I guess) would still make both option much faster than `looping` and `find`... Hope you get more points for that! – Kazimierz Jawor Jun 16 '13 at 06:11
  • I suppose it would be possible to, instead, count the words (temporarily removing them each time), then count the letters within these words - but this might be complicated ;). I'm a little suspicious of Doug's regex figures though, as he is only looking for the letter "a" each time. Anyway, good luck! – Andy G Jun 16 '13 at 10:43
  • @AndrewGibson, the hard-coded "a" in the regex version was a mistake. It should have been the variable `char`. I've fixed it. It makes no difference though, as in all four tests I'm only counting the instances of "a". To count multiple letters you'd call the function in a loop. I don't think that would change the relative speeds though. – Doug Glancy Jun 16 '13 at 16:16
  • Hi @"Doug Glancy". Were you able to run my second version (above) through your timer? It won't be faster, but I was just interested in the comparison. – Andy G Jun 16 '13 at 16:37
2

Below is a simplistic, and perhaps slow, example of counting individual letters (and some other characters) in a document.

Sub CountChars()
    Dim iCount(57) As Integer
    Dim x As Integer
    Dim iTotal As Integer
    Dim iAsc As Integer

    Application.ScreenUpdating = False
    iTotal = ActiveDocument.Range.Characters.Count

    For x = 1 To iTotal
        iAsc = Asc(ActiveDocument.Range.Characters(x))
        If iAsc >= 65 And iAsc <= 122 Then
        iCount(iAsc - 65) = iCount(iAsc - 65) + 1
        End If
    Next x
    For x = 0 To 57
        Debug.Print x, iCount(x)
    Next x
    Application.ScreenUpdating = True
End Sub

Change to

Debug.Print Chr(x + 65), iCount(x)

to display the characters themselves.

It may be possible to use Find (somehow) to count occurrences of characters; otherwise it would require Regex.

Alternative using Replace:

'Tools, References: Microsoft Scripting Runtime
Sub CountCharsWithReplace()
    Dim doc As Document
    Dim rDupe As Range
    Dim dicChars As Scripting.Dictionary
    Dim s As String
    Dim iTotalChars As Integer
    Dim iTempChars As Integer
    Dim iDiff As Integer
    Dim n As Integer
    Dim blnExec As Boolean
    Dim lett As Variant
    Application.ScreenUpdating = False
    Set doc = ActiveDocument
    iTotalChars = doc.Range.Characters.Count
    Set rDupe = doc.Range
    Set dicChars = New Scripting.Dictionary
    Do While rDupe.Characters.Count > 1
        s = rDupe.Characters(1).Text
        blnExec = rDupe.Find.Execute(s, , , , , , , , , "", wdReplaceAll)
        iTempChars = doc.Range.Characters.Count
        iDiff = iTotalChars - iTempChars
        iTotalChars = iTempChars
        If Asc(s) >= 65 And Asc(s) <= 122 Then
            dicChars.Add s, iDiff
        End If
        n = n + 1
    Loop
    ActiveDocument.Undo Times:=n
    Application.ScreenUpdating = True
    For Each lett In dicChars.Keys
        Debug.Print lett, dicChars(lett)
    Next lett
End Sub
Andy G
  • 19,232
  • 5
  • 47
  • 69
  • There is an example here http://stackoverflow.com/questions/5645762/word-vba-count-word-occurrences that uses Find to count words, rather than letters. Unfortunately, although we can see clearly the count in the Find dialogs, it appears that there is not a simple VBA property that we can use to obtain this detail. – Andy G Jun 15 '13 at 19:01
  • Find seems like the way to go. I was just able to modify [this code](http://support.microsoft.com/kb/240157), which uses `Find` in a loop, and count the instances of "a". You could easily fold it into your code above. – Doug Glancy Jun 15 '13 at 19:16
  • Please look at edit2, I get out of range. I used your solution. – Yoda Jun 15 '13 at 19:21
  • @'Doug Glancy' If we modify it to look for letters, rather than words, would it be faster? Dunno.. ;) – Andy G Jun 15 '13 at 19:21
  • @Yoda You've probably got a character beyond (asc) 128 in your document; probably a comma. – Andy G Jun 15 '13 at 19:26
  • @AndrewGibson But I checked for it with if, look now please. – Yoda Jun 15 '13 at 19:28
  • @AndrewGibson Don't tell me that arrays start at index 1 O.O – Yoda Jun 15 '13 at 19:30
  • @Yoda Arrays start at 0 (by default) but Word collections start at 1. You are not assigning anything to iAsc, which is the value you are checking. When an error occurs the debugger takes you to this line, and this specific variable. – Andy G Jun 15 '13 at 19:35
  • @AndrewGibson Thank you for your help is there any posibility of aking this faster? I thought of `Dim s As String s = ActiveDocument.Content.Text` then use `s` but not sure if it helps – Yoda Jun 15 '13 at 19:40
  • You could modify it to use Find, as discussed in other comments. I don't know whether it would be faster though. I suppose you could store the whole document text into a string array; Regex might then be a consideration. Again, I don't know if this would be faster. Good luck. – Andy G Jun 15 '13 at 19:43
  • Andrew, it turns out there are significant speed differences. See my lengthy edit for the results. – Doug Glancy Jun 16 '13 at 03:51
  • @'Doug Clancy' Thank you, interesting. Would you mind giving figures for the alternative method I've added? It uses Replace, and requires a reference to the Microsoft Scripting Runtime. It won't be as fast as stuffing the content into a string though. – Andy G Jun 16 '13 at 03:59
  • The regex version can be improved even further by making the regex-object static or global. (But you are only looking for the letter "a"?) – Andy G Jun 16 '13 at 04:13
  • Andrew, I tried to time your method. One issue is that it goes into an endless loop because the line `Do While rDupe.Characters.Count > 1` never evaluates to `True`. I think it's because there are characters in the doc outside the range that you chose. (The doc consists of the text from this page, pasted as-is.) The number of chars only got down to about 1,000. So if I set the `While` test to 2,000 it took about 16 seconds. My biggest concern with this method if it doesn't complete you are left with a modified document. I think you could fix it though. The basic `Replace` idea is interesting. – Doug Glancy Jun 16 '13 at 18:26
  • @DougGlancy Yes, I originally had `Count > 0` but there is always at least one character left. It could be worked around, but I'm not too concerned now. My original intention was to `Duplicate` the entire Range and use Replace on this, but this method only copies the range-markers and properties, not the content. It then occurred to me to create a new document and copy the content to this, so that I could perform the replacements without having to Undo. Anyway, I was just exploring the possibilities. Never mind. Regards, Andy. – Andy G Jun 16 '13 at 18:44