1

Is it possible to have nested regular-expression replace functions in Excel VBA and/or as a cell formula?

for example I have the following text: "Lorem [ipsum dolor sit] amet, [consetetur sadipscing] elitr." (note the square brackets)

is there any possibility to transform it to: "Lorem ipsum_dolor_sit amet, consetetur_sadipscing elitr."?

so i want to:

  1. find all the terms inside square brackets
  2. inside them: replace all blanks with an underscore
  3. remove the square brackets

and get returned the whole sentence with these replacements.

with my current attemp, I can only remove the brackets:

Function RegexReplace(...)
...
Set oRegex = CreateObject("VBScript.RegExp")
oRegex.Pattern = "\[([^\[\]]*)\]"
oRegex.Replace(strInput, "$1")
...

and nesting this function as formula in a Cell or inside the Code oRegex.Replace(strInput, Replace("$1", " ", "_") seems not to be possible, because the nested replace-function Replace(.. is called before evaluating the replacement string $1, so there is no blank to replace.

any proposals for solution? Thanks :-)

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
MI285
  • 21
  • 5
  • Have you looked at [this answer](https://stackoverflow.com/a/22542835/4717755) ? – PeterT Nov 14 '19 at 16:49
  • Do you want to add spaces if there is no `[` or `]`? Like `a [ b c` -> `a [ b_c`? Or `a b]c` => `a_b]c`? – Wiktor Stribiżew Nov 14 '19 at 21:50
  • @PeterT: yes I had a look, but i do not get how these information helps with my problem. is there anything about nested replacements? – MI285 Nov 15 '19 at 08:54
  • @WiktorStribiżew: if there are no brackets, i dont want to replace anything. but for pair of opening and closing brackets, i want to replace blanks by unserscores inside and remove the brackets. only opening or only closing, or even nested brackets are no usecase. – MI285 Nov 15 '19 at 08:55

2 Answers2

1

another solution would be to execute the regex with every loop und take only the first match and update the result for that. so the character indexes are updated every loop and you can use any replacement that changes the count of characters. here i also use the submatch, that is without the brackets (see Pattern).

Function RegexReplace(cell As Variant) As String
Dim oRegex As Object, m As Object
Dim strMatch As String

Set oRegex = CreateObject("VBScript.RegExp")
With oRegex
    .Pattern = "\[([^\[\]]*)\]"
    .Global = False
End With
RegexReplace = cell.Text
Do While oRegex.Test(RegexReplace)
    Set allMatches = oRegex.Execute(RegexReplace)
    Set FirstMatch = allMatches(0)
    strMatch = Replace(FirstMatch.SubMatches(0), " ", "_")
    RegexReplace = Left(RegexReplace, FirstMatch.FirstIndex) & strMatch & Mid(RegexReplace, FirstMatch.FirstIndex + Len(FirstMatch.Value) + 1)
    DoEvents
Loop

End Function
MI285
  • 21
  • 5
0

You may replace all spaces with underscores in your matches if you modify the code like

Function RegexReplace(cell As Variant) As String
Dim oRegex As Object, m As Object, offset As Long
Set oRegex = CreateObject("VBScript.RegExp")
offset = 0
With oRegex
    .Pattern = "\[([^[\]]*)]"
    .Global = True
End With
RegexReplace = cell.Text
For Each m In oRegex.Execute(RegexReplace)
    RegexReplace = Left(RegexReplace, m.FirstIndex - offset) & Replace(m.SubMatches(0), " ", "_") & Mid(RegexReplace, m.FirstIndex + 1 + Len(m.Value) - offset)
    offset = offset + 2
Next m
End Function

The \[([^[\]]*)] pattern will match [, then captures into Group 1 zero or more chars other than [ and ] and then just matches a ]. Then, you iterate through all matches with For Each m In oRegex.Execute(RegexReplace) and once a match is found, a substring before the match, the match with replace spaces (using Replace(m.Submatches(0), " ", "_")) and the text after match are concatenated to form the function output.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • your code results in a text including the brackets, so if i simply remove all the brackets by `RegexReplace = Replace(RegexReplace, "[", "") RegexReplace = Replace(RegexReplace, "]", "") ` (after the for loop) it works. But if i try to remove these brackets inside the for loop, this gets more complicated because i would change the count of characters. so using `m.FirstIndex` relies on the original string, but not the string that is changed in every loop. So thank you for your solution! – MI285 Nov 15 '19 at 09:33
  • @MI285 Fixed, now it is fully working after adding the offset. – Wiktor Stribiżew Nov 15 '19 at 12:10
  • thanks, also another solution. maybe faster than executing the regex every loop. – MI285 Nov 15 '19 at 14:06