2

Using Range.DetectLanguage, how can I detect the language of each of the paragraphs of a Word document and determine the most used language of the Word document?

The set of documents I wish to run this over can be either French or English, but all will have both English and French in the header, so I cannot use Document.DetectLanguage because this returns WdUndefined on all documents. I need to check all paragraphs and determine what is the most popular language in the document.

What is the best way to do this in VBA?

Veve
  • 6,643
  • 5
  • 39
  • 58
CJ7
  • 22,579
  • 65
  • 193
  • 321
  • Well, loop through the document, and count each language? Then compare the two numbers, and voilá, you know which one is the most used. What is your actual question? I doubt you could avoid looping through the text. Also note that language is not paragraph-level, but character-level setting to my knowledge. So it is possible to have two different languages inside one paragraph. – vacip Feb 14 '16 at 11:42
  • @vacip: I want a generic solution that will work for any languages. Can you please provide an answer that actually loops through and counts the languages? How would you do it? How would you keep a count of the languages found? – CJ7 Feb 14 '16 at 11:46
  • 1
    Well, looking at your score I assume you know how to write a loop and use a few variables ;) Have a look at [this](http://stackoverflow.com/questions/22711120/how-to-loop-through-each-word-in-a-word-document-vba-macro) for looping through a document's words. I'd use a simple array to store the languages found, then sum up the different ones at the end. Try writing it, and come back here if you get stuck. – vacip Feb 14 '16 at 11:52
  • @vacip See my answer below. I have tried to write it. What do you think? – CJ7 Feb 14 '16 at 21:25
  • Cool. :) I assume it works properly. Well done. – vacip Feb 14 '16 at 22:17

2 Answers2

4
Dim doc As Document, para As Paragraph
Dim lang As WdLanguageId
Dim dict As New Dictionary

Set doc = ActiveDocument
If Not doc.LanguagedDetected Then doc.DetectLanguage
' count languages in paragraphs
For Each para In doc.Paragaphs
   lang = para.Range.LanguageId
   If Not dict.Exists(lang) Then 
       dict.add lang, 1
   Else
       dict(lang) = dict(lang) + 1
   End if
Next
' determine most popular language
Dim maxCount As Integer, maxKey As wdLanguageId
For Each key In dict.Keys()
   If dict(key) > maxCount Then 
      maxCount = dict(key)
      maxKey = key
   End if
Next

Debug.Print "Most popular language is: " & maxKey
CJ7
  • 22,579
  • 65
  • 193
  • 321
2

Using Dutch, French and English documents. It is my experience that Office DOES NOT recognize the language the right way. I write a document in the system language: okay, spelling and grammar are controlled, and language is automatically set to system language (even if the two other languages are installed in the system and in the office-laguage options)

Even while writing this text, all words are red underlined , so chrome does not detect the language either.

The system language is Dutch, and this problem has always existed, whatever I try or do, I have to select all, set the language manually, and then do the spelling check.

Looping through the languages makes no sense, if the detection is not right. It seems to me the language/spelling/grammar detecting/checking/ correcting options are on a stand-by since Ms-office 2007, or almost a decade. see here

If this has to do with the fact that Dutch is a 'small' language, I don't know. If there was a way to "set language" for the current document, a simple start-up code would do the job, so far, I did not find code that does this, except this little simple code I wrote:

sub setlng()
'set language
Selection.WholeStory
With Selection
Select Case InputBox("What's your language? (NL= Nederlands, FR = Français, EN = English, DE = Deutch)")
Case "Nl", "NL", "nL"
.LanguageID = wdDutch
Case "Fr", "FR", "fR"
.LanguageID = wdFrench
Case "En", "EN", "eN"
.LanguageID = wdEnglishUS
Case "De", "DE", "dE"
.LanguageID = wdGerman
End Select

Application.CheckLanguage = True
End With

End sub

Clearly, since MSoffice was written in English, you have to use the ENGLISH word for your language, in stead of the language's it's word for it's language, which would be logical...

I'm very curious about people who live in Azerbeidjan, eve find their language "Selection.LanguageID = wdAzeriCyrillic" ... hm...

Johan D.
  • 61
  • 3