1

This feels like it should be simple. I have a .HTML file stored on my computer, and I'd like to read the entire file into a string. When I try the super straightforward

Dim FileAsString as string 

Open "C:\Myfile.HTML" for input as #1
Input #1, FileAsString
Close #1

debug.print FileAsString

I don't get the whole file. I only get the first few lines (I know the immediate window cuts off, but that's not the issue. I'm definitely not getting the whole file into my string.) I also tried using an alternative method using the file system object, and got similar results, only this time with lots of weird characters and question marks thrown in. This makes me think it's probably some kind of encoding issue. (Although frankly, I don't fully understand what that means. I know there are different encoding formats and that this can cause issues with string parsing, but that's about it.)

So more generally, here's what I'd really like to know: How can I use vba to open a file of any extension (that can be viewed in a text editor) and length (that's doesn't exceed VBA's string limit), and be sure that whatever characters I would see in a basic text editor are what gets read into a string? (If that can't be (easily) done, I'd certainly appreciate being pointed towards a method that's likely to work with .html files) Thanks so much for your help

EDIT: Here's an example of what happens when I use the suggested method. Specifically

    Dim oFSO As Object
    Dim oFS As Object, sText As String

    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFS = oFSO.OpenTextFile(Path)

    Do Until oFS.AtEndOfStream
        sText = oFS.ReadAll()
    Loop
    FileToString = sText

    Set oFSO = Nothing
    Set oFS = Nothing

End Function

I'll show you both the beginning (via a message box) and the end (via the immediate window) because both are weird in different ways. In both cases I'll compare it to a screen capture of the html source displayed in chrome:

Beginning: enter image description here

enter image description here

End: enter image description here

enter image description here

Community
  • 1
  • 1
ebrts
  • 361
  • 2
  • 7
  • 17
  • 1
    Can you provide an example of the encoded text? –  Aug 17 '13 at 09:41
  • A quick search on here showed other similar questions. Perhaps this one might help: http://stackoverflow.com/questions/1376756/superfast-way-to-read-large-files-line-by-line-in-vba-please-critique – Taptronic Aug 17 '13 at 14:02
  • Question marks are usually indicative of doublebyte unicode characters that aren't represented in all fonts or operating systems... you may try to use an `ADODB` stream to read the contents, but I would suspect there is some problem with your HTML document. – David Zemens Aug 17 '13 at 14:11
  • @osknows I've added screencaps of the original file, as well as the results i'm getting when trying to read it into a string using the upvoted method below. (I really appreciate all your help) – ebrts Aug 17 '13 at 16:59
  • How are you saving the html? If I save a page from that site manually and use my code below all is working as expected with no double spacing. It might be your method of fetching the data rather than importing html/textfiles that is the issue. –  Aug 17 '13 at 17:39
  • 1
    @osknows the contents of the file is the .responseText of an XMLHTTP object. Could the problem be the somewhat arbitrary ".html" that I tacked on to the end o the file name? – ebrts Aug 17 '13 at 19:15
  • by the way, I tried just changing the extension, (when the file is created) and it didn't fix the problem. – ebrts Aug 17 '13 at 19:53

4 Answers4

5

This is one method

Option Explicit

    Sub test()

    Dim oFSO As Object
    Dim oFS As Object, sText As String

    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFS = oFSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv")

    Do Until oFS.AtEndOfStream
    ' sText = oFS.ReadLine 'read line by line
    sText = oFS.ReadAll()
    Debug.Print sText
    Loop
    End Sub

EDIT:

Try changing the following line to one of the following 3 lines and see if it makes any difference

http://msdn.microsoft.com/en-us/library/aa265347(v=vs.60).aspx

Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 0)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 1)
Set FS = FSO.OpenTextFile("C:\Users\osknows\Desktop\import-store.csv", 1, 2)

EDIT2:

Does this code work for you?

Function ExecuteWebRequest(ByVal url As String) As String

    Dim oXHTTP As Object

    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    oXHTTP.Open "GET", url, False
    oXHTTP.send
    ExecuteWebRequest = oXHTTP.responseText
    Set oXHTTP = Nothing

End Function

Function OutputText(ByVal outputstring As String)
    MyFile = ThisWorkbook.Path & "\temp.html"
    'set and open file for output
    fnum = FreeFile()
    Open MyFile For Output As fnum
    'use Print when you want the string without quotation marks
    Print #fnum, outputstring
    Close #fnum
End Function

Sub test()
Dim oFSO As Object
Dim oFS As Object, sText As String
Dim Uri As String, HTML As String

    Uri = "http://www.forrent.com/results.php?search_type=citystate&page_type_id=city&seed=859049165&main_field=12345&ssradius=-1&min_price=%240&max_price=No+Limit&sbeds=99&sbaths=99&search-submit=Submit"
    HTML = ExecuteWebRequest(Uri)
    OutputText (HTML)

    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\temp.html")

    Do Until oFS.AtEndOfStream
    ' sText = oFS.ReadLine 'read line by line
    sText = oFS.ReadAll()
    Debug.Print sText
    Loop

End Sub

enter image description here

  • So this would work, except that it appears to inert a space after every character. Again with the weird encoding... – ebrts Aug 17 '13 at 08:29
  • @ebrts please include screenshot or example of your HTML contents and also of the "weird" output so we can better diagnose what's happening. Thanks! – David Zemens Aug 17 '13 at 14:11
2

Okay so I finally managed to figure this out. The VBA file system object can only read asciiII files, and I had saved mine as unicode. Sometimes, as in my case, saving an asciiII file can cause errors. You can get around this, however, by converting the file to binary, and then back to a string. The details are explained here http://bytes.com/topic/asp-classic/answers/521362-write-xmlhttp-result-text-file.

ebrts
  • 361
  • 2
  • 7
  • 17
1

A bit late to answer but I did this exact thing today (works perfectly):

Sub modify_local_html_file()
    Dim url As String
    Dim html As Object
    Dim fill_a As Object

    url = "C:\Myfile.HTML"

    Dim oFSO As Object
    Dim oFS As Object, sText As String

    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFS = oFSO.OpenTextFile(url)

    Do Until oFS.AtEndOfStream
        sText = oFS.ReadAll()
        Debug.Print sText
    Loop

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = sText

    oFS.Close
    Set oFS = Nothing

    '# grab some element #'
    Set fill_a = html.getElementById("val_a")

    MsgBox fill_a.innerText

    '# change its inner text #'
    fill_a.innerText = "20%"

    MsgBox fill_a.innerText

    '# open file this time to write to #'
    Set oFS = oFSO.OpenTextFile(url, 2)

    '# write it modified html #'
    oFS.write html.body.innerHTML
    oFS.Close

    Set oFSO = Nothing
    Set oFS = Nothing

End Sub
Alex L
  • 4,168
  • 1
  • 9
  • 24
-1
'You're gonna love this code, it's pretty helpful.
'Extract Title from HTML & File path - Put the file in the same folder.

Private Sub GetHTMLpageTitle()
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Application.EnableEvents = False
Application.PrintCommunication = False
'=========================================
'|||||||||||||||||||||||||||||||||||||||||||||||||||||
'*****************************************************
'|||||||||||||||||||||||||||||||||||||||||||||||||||||
'*****************************************************
Dim folderPath As String
Dim fileName As String
Dim destCell As Range, r As Long
Dim destCelL2 As Range
Dim TitleTag_Start As Long
Dim TitleTag_End As Long
Dim Title As String
Dim oFSO As Object
Dim oFs As Object
Dim sText As String
Dim A As String
    
On Error GoTo 0
    'Getting the path data and file names with extention .HTML

folderPath = ActiveWorkbook.Path
    If Right(folderPath, 1) <> "\" Then folderPath = folderPath & "\"
        fileName = Dir(folderPath & "*.html")
    
    'Prepare the destination data cells for inputs
    With ThisWorkbook.ActiveSheet
        Set destCell = .Cells(.Rows.Count, "A").End(xlUp)
        If Not IsEmpty(destCell.Value) Then Set destCell = destCell.Offset(1)
    End With
    
    With ThisWorkbook.ActiveSheet
        Set destCelL2 = .Cells(.Rows.Count, "B").End(xlUp)
        If Not IsEmpty(destCelL2.Value) Then Set destCell = destCell.Offset(1)
    End With

'=====================================
   ' Counter reset
 r = 0
 
' Looping main actions till the last html file in folder

Do While fileName <> vbNullString
 ' Counter start
       r = r + 1

 ' Opening HTML files to read
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFs = oFSO.OpenTextFile(folderPath & fileName)
sText = oFs.ReadAll()
FileToString = sText


'Search for Title Tags
TitleTag_Start = InStr(1, UCase(FileToString), "<TITLE>")
TitleTag_End = InStr(1, UCase(FileToString), "</TITLE>")

'Extracting TITLE
TitleTag_Start = TitleTag_Start + Len("<TITLE>")
Title = Mid(FileToString, TitleTag_Start, TitleTag_End - TitleTag_Start)


' Writing inputs
destCell.Offset(r).Resize(1, 1).Value = Title
destCelL2.Offset(r).Resize(1, 1).Value = folderPath & fileName

' File name reset
fileName = Dir
        
    Loop

0:
'==================================
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.EnableEvents = True
Application.PrintCommunication = True
Application.CellDragAndDrop = True
End Sub