2

Basically...

I'm trying to perform custom capitalisation on a string; I've spent a few hours fighting with Regex to no avail...

Requirement:

I need to capitalise:

  1. If first word >3 chars: First letter of the first word.
  2. If last word >3 chars: First letter of the last word.
  3. Always: First letter following a hyphen or apostrophe.

(The final regex needs to be implementable into VB6)

Examples:
anne-marie          >  Anne-Marie          // 1st letter of first word + after hyphen
vom schattenreich   >  vom Schattenreich   // 1st letter of last word
will it work-or-not >  Will it Work-Or-Not // 1st letter of outer words + after hyphens
seth o'callaghan    >  Seth O'Callaghan    // 1st letter of outer words + after apostrophe
first and last only >  First and last Only // 1st letter of outer words (excl. middle)
sarah jane o'brien  >  Sarah jane O'Brien  // 1st letter of outer words (excl. middle)

What I've got so far:

I've garbled together two regex's that can very nearly, between them, accomplish what I need. However my attempts to merge them into or write it as a single regex have failed spectacularly.

My main difficulty is that part of my capitalisation applies to the first and last words only, whereas the punctuation specific capitalisation needs to apply to the whole string. But I don't know enough about regex to be sure it's possible to do with one expression.

My Regex's:

First letter of First and Last words but doesn't limit to words more than 3 characters, and doesn't handle full string punctuation capitalisation

^([a-zA-Z]).*\s([a-zA-Z])[a-zA-Z-]+$

First letter of all words, and after punctuation, where more than 3 chars but doesn't exclude middle words, or handle punctuation at end

(\b[a-zA-Z](?=[a-zA-Z-']{3}))

The Question

How I can combine these two regex's to meet my requirements, or correct them enough that they can be used separately? Alternatively provide a different regex that meets the requirements.

Reference / Relevant source material:

Regex capitalize first letter every word, also after a special character like a dash

First word and first letter of last word of string with Regex

Community
  • 1
  • 1
Matthew Hudson
  • 1,306
  • 15
  • 36

1 Answers1

1

Here is my one regex approach:

Sub ReplaceAndTurnUppercase()

Dim reg As RegExp
Dim res As String

Set reg = New RegExp
With reg
    .Pattern = "^[a-z](?=[a-zA-Z'-]{3})|\b[a-zA-Z](?=[a-zA-Z'-]{3,}$)|['-][a-z]"
    .Global = True
    .MultiLine = True
End With
s = "anne-marie" & vbCrLf & "vom schattenreich" & vbCrLf & "will it work-or-not" & vbCrLf & "seth o'callaghan" & vbCrLf & "first and last only" & vbCrLf & "sarah jane o'brien"
res = s
For Each Match In reg.Execute(s)
    If Len(Match.Value) > 0 Then
        res = Left(res, Match.FirstIndex) & UCase(Match.Value) & Mid(res, Match.FirstIndex + Len(Match.Value) + 1)
    End If
Next Match
Debug.Print res ' Demo part

End Sub

enter image description here

The regex I am using is ^[a-z](?=[a-zA-Z'-]{3})|\b[a-z](?=[a-zA-Z'-]{3,}$)|['-][a-z]. Since all the characters consumed are just the letters we want to turn uppercase or hyphen/apostrophe, we can turn them all uppercase without caring to capture any of them.

The regex matches 3 alternatives:

  • ^[a-z](?=[a-zA-Z'-]{3}) - start of a string (in my case, line since I used Multiline=True) followed with a lowercase ASCII letter (consumed, to be uppercased later) that has 3 characters after it, letters or ' or - (not consumed, inside a lookahead)
  • \b[a-z](?=[a-zA-Z'-]{3,}$) - a word boundary \b followed with a lowercase ASCII letter (consumed) followed with 3 or more letters or ' or - up to the end of the string (line in my case)
  • ['-][a-z] - matches ' or - and then a lowercase letter (anywhere in the string).

The res = Left(res, match.FirstIndex) & UCase(match.Value) & Mid(res, match.FirstIndex + Len(match.Value) + 1) line does the job: it just gets the part of the string up to the index found, then adds the modified text, and appends the rest.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563