1

I´m trying to make an application in Excel vba, but I´ve some problems. I need my Excel application to download some files that are in a zip format. I´ve already done this part of the problem, my app can download and unzip the files. Next I´ve to read the whole file in .htm extension and get some information from it. It is required to work like this that when app is opened, program should look for the last number of contest which is “concurso” in Basil then look for the same number in .htm file and start copying the next data.

I´ve already discovered a pattern to read the file and get the data I want, but I don’t know how to code for it. The pattern in .htm file to be extracted is inside tags td, a text that have 2 slash, therefore, I´ve a date, at this time, I´ve to do 3 things, get the date, the line above the date I´ve the number of concurso, so I need to get it too, and the 15 lines beneath the date I´ve 15 numbers that I need them too. This pattern doesn’t change and have to be processed till the end of .htm file. and transfer these data to my sheet to be manipulated later.

I shall provide further clarifications in case of some doubts about the problem.
This is the code that I'm using to download and unzip the files.↓

Sub DownloadEUnzip()
    Dim FSO, oApp As Object
    Dim objHttp, DefPath, Arquivo As String
    Dim Dados() As Byte
    Dim Fname As Variant
    Dim FileNameFolder As Variant
    Dim iFileNumber As Long

    Dim diretorio As String

    diretorio = Dir("c:\lotofacil\D_LOTFAC.HTM")

    If diretorio = "D_LOTFAC.HTM" Then
        Kill "C:\lotofacil\*"
    End If

    Set objHttp = CreateObject("MSXML2.ServerXMLHTTP")
    objHttp.Open "GET", "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_lotfac.zip", False
    objHttp.Send
    DefPath = "C:\lotofacil\" '<<< Altere aqui
    Arquivo = DefPath & "D_lotfac.zip"
    If objHttp.Status = "200" Then
        Dados = objHttp.ResponseBody
        iFileNumber = FreeFile
        Open Arquivo For Binary Access Write As #iFileNumber
        Put #iFileNumber, 1, Dados
        Close #iFileNumber
    End If
    If Right(DefPath, 1) <> "\" Then
        DefPath = DefPath & "\"
    End If
    FileNameFolder = DefPath

    Set oApp = CreateObject("Shell.Application")
    oApp.Namespace(FileNameFolder).CopyHere oApp.Namespace("C:\lotofacil\D_lotfac.zip").items
    MsgBox "Arquivos baixados e descompactados com sucesso!"
End Sub

HEREOne can download the file for visualization of the problem.

ps A folder called lotofacil is to be created on C: drive for spreadsheet proper working.

UPDATE 1

code to find a date

If Mid(dataline, 19, 1) = "/" And Mid(dataline, 22, 1) = "/" Then
    Debug.Print dataline
End If

UPDATE 2

so caio, its real fast now, but while i´m using i noticed that the program was taking a column less than a need, and i change the code and it works Apparently..would like u take a look to see if i didnt any mess... i change the size of array, and look like it work :) take a look.

Sub ReadLines()

Dim dataArray() As String
Dim strText
Dim result As String
Dim regExDate As New RegExp, regExAnyContent As New RegExp
Dim matches As MatchCollection
Dim match As match
Dim previous As String, current As String
Dim currentLine As Integer
ReDim dataArray(17, 1000)

regExDate.Pattern = "(\d{2}/\d{2}/\d{4})"
regExAnyContent.Pattern = "<td[^>]*>([^<]*)"
dirPath = "c:\lotofacil\"
filePath = dirPath & "D_LOTFAC.HTM"
result = ""
currentLine = 0

If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

Open filePath For Input As #FileNum
previous = ""

While Not EOF(FileNum)
    Line Input #FileNum, current ' read in data 1 line at a time

    If Len(current) > 0 Then
        Set matches = regExDate.Execute(current)
        If matches.Count > 0 Then
            dataArray(1, currentLine) = matches.Item(0)
            dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0)
            For i = 1 To 16
                Line Input #FileNum, current
                While current = ""
                    Line Input #FileNum, current
                Wend
                dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0)
            Next
            currentLine = currentLine + 1
            If currentLine Mod 1000 = 0 Then
                ReDim Preserve dataArray(17, currentLine + 1000)
            End If
        End If
        previous = current
    End If


    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend

Range(Cells(1, 1), Cells(currentLine, 17)) = Application.Transpose(dataArray)

End Sub

but still happenin a thing really strange, in the sheet that the data is putted, the dates are wrong, i needed them in the format dd/mm/yyyy, and i know i'm already bugging u, but if is too difficult to make this change u could just take off this column of the date ? please...

and first of all thank you very much, u´re really good on excel ;)

Sandman
  • 15
  • 8
  • Look at Using Internet Explorer through VBA, there are a couple of approaches, just open the file in a stream to read it and then use split on , or open via the IE and use GetElementsByTagName("TD") http://stackoverflow.com/questions/18286598/read-local-html-file-into-string-with-vba – Nathan_Sav Jul 18 '16 at 15:32
  • Please read through [How to Ask](http://stackoverflow.com/help/how-to-ask), and show us what you have tried so far please. Post the code here, in the post. I, and I bet others, are not going to download files from the internet, *especially* if Macros may be involved. – BruceWayne Jul 18 '16 at 15:39

1 Answers1

1

Try this to read the file to the clipboard and paste its content into a Worksheet, this will create a normal Excel table which you will be able to work with.

This will use the natural ability of excel to parse html table to regular excel table.

Sub ReadFilePasteAsTable()
Dim objData As New MSForms.DataObject
Dim strText
Dim result As String
Dim numberOfLines Integer
Dim wsh As Object
Set wsh = VBA.CreateObject("WScript.Shell")


numberOfLines = 126
dirPath = "c:\lotofacil\"
diretorio = Dir(dirPath & "D_LOTFAC.HTM")
result = ""

If Not diretorio = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

filePath = dirPath & "D_LOTFAC.HTM"
outPath = dirPath & "out.txt"
pscommand = "Powershell -Command ""''+$(cat " & filePath & " -Tail 126) > " & outPath & """"
wsh.Run pscommand, 0, True

Open outPath For Input As #FileNum

While Not EOF(FileNum)
    Line Input #FileNum, DataLine ' read in data 1 line at a time
    result = result & DataLine
    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend

    objData.SetText result
    objData.PutInClipboard

ActiveSheet.Paste Destination:=[A1]
End Sub

Don't forget to add reference to Microsoft Forms 2.0. To add reference open up VBA window, open menu Tools->References...

If you cannot find Microsoft Forms 2.0 Object Library open Browse... and it will be in C:\Windows\SysWOW64\FM20.dll for 64 bits OS or C:\Windows\System32\FM20.dll for 32 bits.

UPDATE

Now you need to add reference to Microsoft VBScript Regular Expressions 5.5

Sub ReadLines()
Dim dataArray() As String
Dim strText
Dim result As String
Dim regExDate As New RegExp, regExAnyContent As New RegExp
Dim matches As MatchCollection
Dim match As match
Dim previous As String, current As String
Dim currentLine As Integer
ReDim dataArray(16, 1000)

regExDate.Pattern = "(\d{2}/\d{2}/\d{4})"
regExAnyContent.Pattern = "<td[^>]*>([^<]*)"
dirPath = "c:\lotofacil\"
filePath = dirPath & "D_LOTFAC.HTM"
result = ""
currentLine = 0

If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

Open filePath For Input As #FileNum
previous = ""

While Not EOF(FileNum)
    Line Input #FileNum, current ' read in data 1 line at a time

    If Len(current) > 0 Then
        Set matches = regExDate.Execute(current)
        If matches.Count > 0 Then
            dataArray(1, currentLine) = matches.Item(0)
            dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0)
            For i = 1 To 15
                Line Input #FileNum, current
                While current = ""
                    Line Input #FileNum, current
                Wend
                dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0)
            Next
            currentLine = currentLine + 1
            If currentLine Mod 1000 = 0 Then
                ReDim Preserve dataArray(16, currentLine + 1000)
            End If
        End If
        previous = current
    End If


    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend



Range(Cells(1, 1), Cells(currentLine, 16)) = Application.Transpose(dataArray)
End Sub
caiohamamura
  • 2,260
  • 21
  • 23
  • thanks for help caio, but how can i do do add this reference, cause a tried to do like on vb.net with the key word imports but it doesnt work :/ – Sandman Jul 18 '16 at 16:24
  • thank u man, I´ve already got it too, looking on web I saw that its just put a form on project and it add the reference we need – Sandman Jul 18 '16 at 16:35
  • Does it satisfies what you need? – caiohamamura Jul 18 '16 at 16:45
  • so Caio, I´m trying to improve the code, cause i need that it get updated 3 times for week, and I need that is just update from the last concurso on sheet forth, but u help me so much, and if u want put this question as solved I´ll understand ;) – Sandman Jul 18 '16 at 16:56
  • and yes, your code works properly, but I´m trying to improve cause its take to much time to run, but as I said, thank for help, and any more help is welcome :) – Sandman Jul 18 '16 at 17:02
  • Parsing the text file in VBA will be quite slow. You can preprocess the file to get only the last n lines calling powershell `""+$(cat .\D_LOTFAC.HTM -Tail 126) > dlot_fac.txt` and then run the code I provided in that preprocessed text file.
    – caiohamamura Jul 18 '16 at 18:09
  • so caio, thanks 4 help but ima kinda newbie on excel and dont know how to do it that u said :/ – Sandman Jul 18 '16 at 18:35
  • Actually you would need to call `Shell("Powershell -Command """"+$(cat .\D_LOTFAC.HTM -Tail 126) > dlot_fac.txt""`
    – caiohamamura Jul 18 '16 at 18:40
  • in the beginning of that function you did? – Sandman Jul 18 '16 at 18:45
  • I've updated the answer to read just the 126 lines of the file, of course this will assume the number of lines added to the end of the file is always the same. – caiohamamura Jul 18 '16 at 19:19
  • hmmm so caio, the problem is that, the number of lines can change :/ – Sandman Jul 18 '16 at 20:05
  • i´m thinking about do something, but i dont know how, look, there is some way to identify when the dataline had 2 slash? for we know that is a date ? cause i did a test to try to looking for the number of concurso, and it works, till now lol, but if we can find the dataline that has a date, we know that the previous line has a concurso number and the last 15 line before the date has the numbers I need, and we just put it on that string result, what do u think? i´m my humble opinion i think the process coud get faster, couldnt it? – Sandman Jul 18 '16 at 20:07
  • caio, i think i got a way to pick a date, take a look on my post at UPDATE 1 – Sandman Jul 18 '16 at 21:35
  • caio now i can see that I know nothing about excel :) u´re the man ;) but please take a look at update 2, then i promisse i´ll let u in peace lol... – Sandman Jul 19 '16 at 15:44
  • You should post another question – caiohamamura Jul 19 '16 at 15:54
  • so, thanks for the help caio, i´ve already solved my problem, but i just got it with your help ;) – Sandman Jul 25 '16 at 23:39