2

this is my VBA code:

Sub RemoveTags()
Dim r As Range
Selection.NumberFormat = "@"  'set cells to text numberformat
With CreateObject("vbscript.regexp")
   .Pattern = "\<.*?\>"
   .Global = True
For Each r In Selection
    r.Value = .Replace(r.Value, "")
   Next r
End With
End Sub

It does remove all the Markup language from my cells but crashes when i run it for ~40k records. Is there something wrong with my code or should I change excel settings?

Comintern
  • 21,855
  • 5
  • 33
  • 80
centrinok
  • 300
  • 2
  • 11
  • 6
    [Don't use regex to parse HTML (or to process it)](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). You're probably better off loading the HTML in to an `HTMLFile` object, and returning e.g., the `.innerText` of the document object. – David Zemens Aug 01 '16 at 19:30
  • try `Selection.Replace "<*>", ""` instead – Slai Aug 01 '16 at 22:31

1 Answers1

1

My guess is that Excel craches while trying to write the text back to the cell. Here is a couple of things you can try:

  • Use .Value2 rather than .Value to work with the raw value.
  • Add a single quote in front of the text. It won't appear, but it will ensure the text format
  • Use a not followed by pattern rather than a non greedy to be sure to handle line breaks.
Sub RemoveTags()
    Dim values(), r As Long, c As Long, re As Object

    ' load the values in an array
    values = Selection.Value2

    ' create the regex
    Set re = CreateObject("vbscript.regexp")
    re.pattern = "<[^>]*>"
    re.Global = True

    ' remove the tags for each value
    For r = 1 To UBound(values, 1)
        For c = 1 To UBound(values, 2)
            values(r, c) = "'" & re.replace(values(r, c), vbNullString)
        Next
    Next

    ' write the values back to the sheet
    Selection.Value2 = values
End Sub
michael
  • 929
  • 6
  • 19