0

I am making a small "home" application using VB. As the title says, I want to grab a part of text from a local html file and use it as variable, or put it in a textbox.

I have tried something like this...

Private Sub Open_Button_Click(sender As Object, e As EventArgs) Handles Open_Button.Click
    Dim openFileDialog As New OpenFileDialog()
    openFileDialog.CheckFileExists = True
    openFileDialog.CheckPathExists = True
    openFileDialog.FileName = ""
    openFileDialog.Filter = "All|*.*"
    openFileDialog.Multiselect = False
    openFileDialog.Title = "Open"

    If openFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then
        Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
            TextBox.Text = fileReader
    End If
End Sub

The result is to load the whole html code inside this textbox. What should I do so to grab a specific part of html files's code? Let's say I want to grab only the word text from this span...<span id="something">This is a text!!!</a>

Simos Sigma
  • 958
  • 7
  • 29
  • 1
    Possible duplicate of [How do you parse an HTML in vb.net](http://stackoverflow.com/questions/516811/how-do-you-parse-an-html-in-vb-net) – Heinzi Mar 26 '16 at 14:12
  • @Heinzi No I don't think so... Please take a look again, I edited my question. – Simos Sigma Mar 26 '16 at 14:27
  • 1
    I don't see how this is different: You want to extract a value from an HTML file, which is more or less the definition of "parsing HTML". – Heinzi Mar 26 '16 at 14:55
  • If I understand your question correctly you want the word "text" either from the `id` attribute or from within the actual `span`, or both? – Visual Vincent Mar 26 '16 at 15:21
  • If so, you could use an HTML parser to retrieve the tag (or it's contents) and then use `Regex` to find the word you want. – Visual Vincent Mar 26 '16 at 15:23
  • @VisualVincent : Only the word "text"! And I have to use this Agility pack? There is any other way? – Simos Sigma Mar 26 '16 at 15:41
  • You don't have to use it. But it's a good option. There was a second answer in the link that uses classes already provided in .NET. So you want the word "text" from the _contents of the span_ then? – Visual Vincent Mar 26 '16 at 15:43

2 Answers2

1

I make the following assumptions on this answer.

  1. Your html is valid - i.e. the id is completely unique in the document.
  2. You will always have an id on your html tag
  3. You'll always be using the same tag (e.g. span)

I'd do something like this:

' get the html document

 Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)

' split the html text based on the span element

Dim fileSplit as string() = fileReader.Split(New String () {"<span id=""something"">"}, StringSplitOptions.None)

' get the last part of the text

fileReader = fileSplit.last

' we now need to trim everything after the close tag

fileSplit = fileReader.Split(New String () {"</span>"}, StringSplitOptions.None)

' get the first part of the text 

fileReader = fileSplit.first

' the fileReader variable should now contain the contents of the span tag with id "something"

Note: this code is untested and I've typed it on the stack exchange mobile app, so there might be some auto correct typos in it.

You might want to add in some error validation such as making sure that the span element only occurs once, etc.

Community
  • 1
  • 1
stormCloud
  • 983
  • 1
  • 9
  • 24
1

Using an HTML parser is highly recommended due to the HTML language's many nested tags (see this question for example).

However, finding the contents of a single tag using Regex is possible with no bigger problems if the HTML is formatted correctly.

This would be what you need (the function is case-insensitive):

Public Function FindTextInSpan(ByVal HTML As String, ByVal SpanId As String, ByVal LookFor As String) As String
    Dim m As Match = Regex.Match(HTML, "(?<=<span.+id=""" & SpanId & """.*>.*)" & LookFor & "(?=.*<\/span>)", RegexOptions.IgnoreCase)
    Return If(m IsNot Nothing, m.Value, "")
End Function

The parameters of the function are:

HTML: The HTML code as string.

SpanId: The id of the span (ex. <span id="hello"> - hello is the id)

LookFor: What text to look for inside the span.

Online test: http://ideone.com/luGw1V

Community
  • 1
  • 1
Visual Vincent
  • 18,045
  • 5
  • 28
  • 75