0

I am trying to get three values from a large html file. I thought I could use the substring method, but was informed that the position of the data may change. Basically, in the following code I need to pick out "Total number of records: 106", "Number of records imported:106", and "Number of records rejected: 0"

<B>Total number of records : </B>106</Font><br><Font face="arial" size="2"><B>Number of records imported : </B>106</Font><br><Font face="arial" size="2"><B>Number of records rejected : </B>0</Font>

I hope this is clear enough. Thanks in advance!

mwell10
  • 9
  • 1
  • 2
    possible duplicate of [How do you parse an HTML in vb.net](http://stackoverflow.com/questions/516811/how-do-you-parse-an-html-in-vb-net) – Bjørn-Roger Kringsjå Jun 25 '15 at 16:28
  • Looks like those answers pull all of a certain tag. There are multiple , , etc. tags in this document and I only need these three values. – mwell10 Jun 25 '15 at 16:35

1 Answers1

1

Simple string operations like IndexOf() and Substring() should be plenty to do the job. Regular Expressions would be another approach that'd take less code (and may allow more flexibility if the HTML tags can vary), but as Mark Twain would say, I didn't have time for a short solution, so I wrote a long one instead.

In general you'll get better results around here by showing you've at least made a reasonable attempt first and showing where you got stuck. But for this time...here you go. :-)

Private Shared Function GetMatchingCount(allInputText As String, textBefore As String, textAfter As String) As Integer?

    'Find the first occurrence of the text before the desired number
    Dim startPosition As Integer = allInputText.IndexOf(textBefore)

    'If text before was not found, return Nothing
    If startPosition < 0 Then Return Nothing

    'Move the start position to the end of the text before, rather than the beginning.
    startPosition += textBefore.Length

    'Find the first occurrence of text after the desired number
    Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)

    'If text after was not found, return Nothing
    If endPosition < 0 Then Return Nothing

    'Get the string found at the start and end positions
    Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)

    'Try converting the string found to an integer
    Try
        Return CInt(textFound)
    Catch ex As Exception
        Return Nothing
    End Try
End Function

Of course, it'll only work if the text before and after is always the same. If you use that with a driver console app like this (but without the Shared, since it'd be in a Module then)...

Sub Main()
    Dim allText As String = "<B>Total number of records : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records imported : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records rejected : </B>0</Font>"""""

    Dim totalRecords As Integer? = GetMatchingCount(allText, "<B>Total number of records : </B>", "<")
    Dim recordsImported As Integer? = GetMatchingCount(allText, "<B>Number of records imported : </B>", "<")
    Dim recordsRejected As Integer? = GetMatchingCount(allText, "<B>Number of records rejected : </B>", "<")

    Console.WriteLine("Total: {0}", totalRecords)
    Console.WriteLine("Imported: {0}", recordsImported)
    Console.WriteLine("Rejected: {0}", recordsRejected)
    Console.ReadKey()
End Sub

...you'll get output like so:

Total: 106

Imported: 106

Rejected: 0

Community
  • 1
  • 1
Arin
  • 1,373
  • 2
  • 10
  • 23
  • Wow thank you! I have been trying a few things, including Regex, I should have added them to my question. Thank you so much! – mwell10 Jun 25 '15 at 17:50
  • If this helped you and ended up being your solution, be sure to upvote/accept the answer! :) – Rein S Jun 25 '15 at 20:05