RegEx .NET replace with nothing

Question

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Dim client As WebClient = New WebClient()
    Label1.Text = client.DownloadString("http://localhost:81/test/index.html")

    Label2.Text = Label1.Text
    Dim input As String = Label1.Text
    Dim output As String = Regex.Replace(input, "<body>|</body>|<html>|</html>", "")

    Label2.Text = output
End Sub

I'm downloading the website and storing in the label1. Transferring it into label2(for testing) as I need to replace label1 with text.

The HTML file(test file). I need to get the link out without any new lines created after or before the link.

<html>
<body>
http://www.google.com
</body>
</html>

How can I only display

http://www.google.com

in a label? Tried replacing it with Nothing and it gives an error.

What error are you getting? I tried your example and it works fine for me. — laylarenee, Dec 31 '13 at 19:15

score 0 · Answer 1 · answered Dec 31 '13 at 16:59

0

Did you try String.empty or vbNullString (string.empty would be better as it's .NET Native and I thought the vb constants were just wrappers).

answered Dec 31 '13 at 16:59

mlw4428

510
1
6
21

score 0 · Answer 2 · edited May 23 '17 at 12:11

Assuming the current value of output is the link with two new line characters (or other extraneous whitespace) both before and after it:

\n
\n
http://www.google.com
\n
\n

then the regex is behaving as expected (you aren't doing anything with whitespace, so the regex ignores it).

Just add a .Trim to the end of the .Replace to eliminate whitespace on either side:

Dim output As String = Regex.Replace(input, "<body>|</body>|<html>|</html>", "").Trim

As an aside, your regex won't work on webpages that are any more complex than your test page. Should you wish to attempt this on actual webpages, your best bet would probably be to use a regex designed to capture the delimiting tags around the link, then another one to grab the link from the results of the first. You could also try instantiating the retrieved page as an HTMLDocument which should handle the actual parsing for you, at which point DOM navigation from a VB.Net code-behind becomes a possibility.

Overall, there are usually better ways to extract information from HTML (rather than using Regex) that you may want to investigate before your project/use-case balloons and this happens. :)

Ahh I totally forgot about the .Trim . Thank you for anwsering but all I need to do is parse 1 link from the page and input it in the label. So my method should work. Thanks again! — user3149734, Dec 31 '13 at 22:04

RegEx .NET replace with nothing

2 Answers2