1

I'm trying to scrape a website for a specific HTML element and convert it into Word, all while keeping the formatting intact. I can successfully import it into Excel first (and then to Word), but the character limit for a single cell doesn't let me grab all of the data that I need. Here is my current code, starting at the point where I grab the HTML from the site:

Set html = ie.document

Dim objWord As Object
   Dim objDoc As Object
   Set objWord = CreateObject("Word.Application")
   Set objDoc = objWord.Documents.Add
   objWord.Visible = True

Sheets("Sheet1").Range("A1").Value = html.getElementById("main_container").innerText
Sheets("Sheet1").Range("A1").Select
Selection.Copy

objDoc.Range.Paste

It would be great if I could skip the Excel step and just paste the HTML right into Word. The other option that I thought of is to paste each sub-paragraph of the main_container HMTL into a separate cell to avoid hitting the maximum character limit. If anyone can help with that, or has other ideas about how I can solve this, it would be much appreciated!

Cross-post here: http://www.mrexcel.com/forum/excel-questions/827926-hmtl-word-visual-basic-applications.html#post4039337

cxw
  • 16,685
  • 2
  • 45
  • 81
JohnnyNitro
  • 11
  • 1
  • 2
  • Have you seen [this](http://stackoverflow.com/questions/11780366/vb-script-or-vba-code-to-copy-the-contents-of-a-web-webpage-to-a-word-excel-shee)? – rusk Jan 09 '15 at 06:11

2 Answers2

2

Use string variable instead!

Dim sInnerText As String, i As Integer, j As Integer

sInnerText = html.getElementById("main_container").innerText

'insert text into word document
objDoc.Range.Text = sInnerText

'or devide it by length of characters to be able to add parts into cells
i = 1
j = 1
Do While j<Len(sInnerText)
    ThisWorkbook.Worksheets("Sheet1").Range("A" & i) = Mid(sInnerText, j, j+255)
    j = j + 256
    i = i + 1
Loop
Maciej Los
  • 8,468
  • 1
  • 20
  • 35
  • Thanks for your response. The problem with using a string is that it holds fewer characters than an excel cell, so I'm still running into the data limit. – JohnnyNitro Jan 09 '15 at 16:42
  • I'm reading that strings can hold billions of characters, so that's probably not the issue. Regardless, when I execute the code above, it only fills two pages of the word doc when it should be around 50 pages. – JohnnyNitro Jan 09 '15 at 17:36
  • Are you sure that it would be 50 pages? – Maciej Los Jan 09 '15 at 17:39
  • I fixed this. The issue was that I was really after a sub-element within that main_container div, and that was causing the errors. Your string solution works great on the actual main_container...I can just edit the text within word now. Thanks much!! – JohnnyNitro Jan 09 '15 at 19:45
  • JohnnyNitro, you should really accept @MaciejLos's answer here by clicking the check mark next to the answer. You will get points for accepting it, and he will get points for providing you with the information you needed. Also, others that come here with the same question will know that this solution worked. – Terry Aug 07 '15 at 15:56
1

Maybe you are looking for something like this. I use this 2 prodedures to set sub and superscript html tag and undo, using WildCards

Sub SuperSub()
'Sub y Super Indices a formato HTML
        With ActiveDocument.Content.Find
                .Text = "" 'Subíndice
                .Font.Subscript = 1
                .Replacement.Text = "<sub>^&</sub>"
                .Replacement.Font.Subscript = 0
                .Execute Replace:=wdReplaceAll
        End With
        With ActiveDocument.Content.Find
                .Text = "" 'Superíndice
                .Font.Superscript = 1
                .Replacement.Text = "<sup>^&</sup>"
                .Replacement.Font.Superscript = 0
                .Execute Replace:=wdReplaceAll
        End With
End Sub
Sub InverSuperSub()
'Sub y Super Indices de html a WORD
        With ActiveDocument.Content.Find
                .Text = "\<sub\>(*)\<\/sub\>" 'Subíndice
                .MatchWildcards = True
                .Font.Subscript = 0
                .Replacement.Text = "\1"
                .Replacement.Font.Subscript = 1
                .Execute Replace:=wdReplaceAll
        End With
        With ActiveDocument.Content.Find
                .Text = "\<sup\>(*)\<\/sup\>" 'Superíndice
                .MatchWildcards = True
                .Font.Superscript = 0
                .Replacement.Text = "\1"
                .Replacement.Font.Superscript = 1
                .Execute Replace:=wdReplaceAll
        End With
End Sub

I hope it'll help you.

Rubén P S
  • 113
  • 6