2

I am automating a web page extraction and writting the contents to a text (HTML) file.

For that I set up a File System Object like this

Dim myHTMLfilepath As String
myHTMLfilepath = "C:\temp\MyFile.html"

Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim myHTMLFile As Object
Set myHTMLFile = fso.createtextfile(myHTMLfilepath)

When I try to write the extracted content to the file sometimes I get an error 5 (invalid parameter). Here is th code:

myHTMLFile.writeline objIE.document.getElementsByClassName("cool-box")(0).innerHTML

It breaks when the length of the innerHTML is somewhere between 25800 and 28000 (I haven't yet figured the exact limit).

Does anyone know if the WriteLine limit can be increased or advise on a different way to do this?

Fabricio
  • 839
  • 9
  • 17
  • You can split yourself your string and add `Chr(10)` for each line break – Camille Aug 20 '19 at 14:19
  • Can you add the link as well for testing ? – Mikku Aug 20 '19 at 14:22
  • @Mikku Unfortunately it's a private site. – Fabricio Aug 20 '19 at 14:23
  • @Camille I thought of doing that but was hoping for a cleaner solution. – Fabricio Aug 20 '19 at 14:24
  • You have both the "reading" *and* the "writing" happening in the same instruction, so it's hard to tell whether the problem is with *reading* the `.innerHTML` or with *writing* to `myHTMLFile`. Split it up, and consider *streaming* the string into the file, rather than writing it all at once from memory. – Mathieu Guindon Aug 20 '19 at 14:43

1 Answers1

0

Assuming the .innerHTML can successfully be read into a string (split up reading/writing to find out), you should be able to use an ADODB.Stream to write it to the file. WriteLine is intended to write a single line of text to a file, not a whole entire document.

Dim contents As String
contents = objIE.document.getElementsByClassName("cool-box")(0).innerHTML

With CreateObject("ADODB.Stream")
    .Open
    .Type = 1
    .Write contents
    .SaveToFile myHTMLfilepath, 2
    .Close
End With
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
  • writting is definitely the problem. My debug shows content in the innerHTML and len() shows a value in the order of 29k. I will try this and let you know. – Fabricio Aug 20 '19 at 15:38
  • @Fabricio note that as a bonus, this method also supports unicode encoding, if applicable. – Mathieu Guindon Aug 20 '19 at 16:08
  • 1
    Yes, I know... And it applies. I used this method a few years back but forgot about it. Actually I just recovered my old code which pretty much matches yours so... :-) I just hope it will hold at least a good 30MB of text before dumping to the file. I'm collecting contents of a few thounsand pages into a single file... :-) – Fabricio Aug 20 '19 at 16:12