0

I am trying to write to a macro which remove html tags from excel data. I just want search for <*> this pattern and replace them with blank. Also need to remove special characters like '“'and tags like if(typeof(dstb)!= "undefined"){ dstb();}.

Code I have written till now requires me to hardcode files name in macro , which I do not want.

code :

 Sub UnescapeCharacters()
 sheetname = "2011 Publications" 'file name goes here

Dim sheet As Worksheet
Set sheet = Me.Worksheets(sheetname)

For Row = 1 To sheet.UsedRange.Rows.Count
    For Column = 1 To sheet.UsedRange.Columns.Count
        Dim cell As Range
        Set cell = sheet.Cells(Row, Column)


        ReplaceCharacter cell, "&quot;", """" 
        ReplaceCharacter cell, "&#44;", ""
        ReplaceCharacter cell, "&nbsp;", ""
        ReplaceCharacter cell, "&bull;", ""

        ReplaceCharacter cell, "</ul>", ""
        ReplaceCharacter cell, "<ul>", ""

        ReplaceCharacter cell, "<b>", ""
        ReplaceCharacter cell, "</b>", ""

        ReplaceCharacter cell, "<i>", ""
        ReplaceCharacter cell, "</i>", ""

        ReplaceCharacter cell, "</li>", ""
        ReplaceCharacter cell, "<li>", ""

        ReplaceCharacter cell, "</br>", ""
        ReplaceCharacter cell, "<br />", ""

        ReplaceCharacter cell, "</p>", ""
        ReplaceCharacter cell, "<p>", ""

    Next Column
Next Row

End Sub

Sub ReplaceCharacter(ByRef cell As Range, ByVal find As String, ByVal replacement As     String)

Dim result As String
cell.Value = Replace(cell.Text, find, replacement, 1, -1)

End Sub

Can someone please help?

Larry
  • 2,764
  • 2
  • 25
  • 36
apgp88
  • 955
  • 7
  • 14

1 Answers1

1

I prefer doing it in this approach. If the HTML is NOT a real website, you can save the HTML as a file, then IE.navigate that filePath.

   Sub testing()
    Dim IE As Object
    Dim stringWithOutTags As String
    Set IE = CreateObject("InternetExplorer.Application")
        ' HardCode the URL address in
        IE.navigate "http://stackoverflow.com/questions/13824872/writing-macro-in-excel-to-remove-html-code"
        Do While IE.Busy
        Loop
        Do While IE.readyState <> 4
        Loop


        stringWithOutTags = IE.document.DocumentElement.innerText

        IE.Quit
    End Sub
Larry
  • 2,764
  • 2
  • 25
  • 36