Edit 1 - solution
Using the pattern posted below as a start, I edited the expression slightly to ensure it captures phrases as well as words, then we just discard any match that starts with a .
Sub test()
Application.EnableEvents = False
getAllTitleCasePhrases Range("D4", Range("D4").End(xlDown))
Application.EnableEvents = True
End Sub
Private Sub getAllTitleCasePhrases(rng As Range)
If objRegex Is Nothing Then Set objRegex = New RegExp
With objRegex
.Global = True
.Pattern = "(\.\s*[A-Z][\w,']*)|([A-Z][\w,']*\s?)+"
End With
Dim cll As Range, testResult As Object, resultsStr As String
For Each cll In rng
Set testResult = objRegex.Execute(cll.Value2)
If testResult.Count > 0 Then
Dim i As Long, j As Long
For i = 0 To testResult.Count - 1
If Left$(testResult(i), 1) <> "." Then
resultsStr = resultsStr & WorksheetFunction.Trim(testResult(i).Value) & ", "
End If
Next i
End If
If Len(resultsStr) > 0 Then resultsStr = Left(resultsStr, Len(resultsStr) - 2)
cll.Offset(0, 1).Value2 = resultsStr
resultsStr = vbNullString
Next cll
Set objRegex = Nothing
End Sub
Problem
I would like to automatically create a list of all defined terms in a contract, so that I can then easily check that they do actually have a corresponding definition somewhere. Defined terms in a contract are written in either Title Case or ALL CAPS, I have included some test cases and expected output below.
My approach is to use a regex pattern to scan through the document and pick out all words or phrases that fit these patterns. This part has been straightforward, by using answers here and here. The issues I have are that (1) this still catches words that are capitalised after a full stop and (2) the space at the end of the last word is included in the match.
Please note my implementation below is in Excel VBA, but it is the regex expression I am really interested in. If both (1) and (2) cannot be achieved together, (1) is more important, as I can strip trailing spaces off elsewhere in the code.
Current solution
(^[^\.])?([A-Z]+[a-z,']*\s?)+
Explanation:
(^[^\.])?
- if there is a full stop at the start of the string, ignore it
[A-Z]+
- match words which start with at least one capital
[a-z,']*
- words can finish with any number of non-capitals and apostrophes are allowed
([A-Z]+[a-z,']*\s?)+
- repeat this pattern to catch multiple words
Test cases
- To Compare between -> To Compare
- to Compare -> Compare
- to Compare Between -> Compare Between
- to COMPARE BETWEEN -> COMPARE BETWEEN
- to COMPARE BETWEEN Two Options -> COMPARE BETWEEN Two Options
- the Purchaser shall -> Purchaser
- The Purchaser shall Pay On Time according to the Schedule agreed between the Parties -> The Purchaser, Pay On Time, Schedule, Parties
- . The Purchaser shall Pay On Time according to the Schedule agreed between the Parties -> Purchaser, Pay On Time, Schedule, Parties
- The Purchaser -> The Purchaser
- . The Purchaser -> Purchaser
- . To COMPARE -> COMPARE
- the Purchaser's Representative -> Purchaser's Representative
- the Purchasers' Representative -> Purchasers' Representative
- The Purchaser's Representative -> The Purchaser's Representative
- . The Purchaser's Representative -> Purchaser's Representative
- ACME International group -> ACME International
- ACME International Group -> ACME International Group
- ABC -> ABC
Implementation
(note list of test cases starts at cell D4
)
Option Explicit
Private objRegex As RegExp
Sub test()
getAllTitleCasePhrases Range("D4", Range("D4").End(xlDown))
End Sub
Private Sub getAllTitleCasePhrases(rng As Range)
If objRegex Is Nothing Then Set objRegex = New RegExp
With objRegex
.Global = True
.Pattern = "(^[^\.])?([A-Z]+[a-z,']*\s?)+"
End With
Dim cll As Range, testResult As Object, resultsStr As String
For Each cll In rng
Set testResult = objRegex.Execute(cll.Value2)
If testResult.Count > 0 Then
Dim i As Long
For i = 0 To testResult.Count - 1
resultsStr = resultsStr & testResult(i).Value & ", "
Next i
End If
If Len(resultsStr) > 0 Then resultsStr = Left(resultsStr, Len(resultsStr) - 2)
cll.Offset(0, 1).Value2 = resultsStr
resultsStr = vbNullString
Next cll
Set objRegex = Nothing
End Sub