5

I'm writing a program that gets the source code of a web page with a video on it. It then uses regular expressions to isolate the download link of that video. then it uses httpwebrequest and httpwebresponse to download the video. My problem arises when certain sites have a page where you have to click continue in order to get to the video page.

For example, there is a video playing on http://nextgenvidz.com/view/s995xvc9e2fv called "The.Matrix.Reloaded.2003.mp4" so I tell my program to get the source code for the url "http://nextgenvidz.com/view/s995xvc9e2fv" but it can't find the video's download link because it's searching for the file in the "continue" page's source code. If you go to that website above and view source, you won't see the link. Then, click continue and do the same when the video appears and you'll notice that the file is only there in the second one.

How can I get the source code for the page that the video is playing on, and not the page where I have to click continue?

I am trying to use this code:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Loading As String = "Loading..."
    TextBox1.Text = Loading
    Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox2.Text)
    Dim response As System.Net.HttpWebResponse = request.GetResponse()

    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim sourcecode As String = sr.ReadToEnd()
    TextBox1.Text = sourcecode
End Sub

Maybe there's a way to auto select the "Continue" button programmatically?

durron597
  • 31,968
  • 17
  • 99
  • 158
daniel11
  • 2,027
  • 10
  • 38
  • 46
  • 1
    I suspect the button has some client side javascript that you would need to interperet. Almost as if the author didn't want you to do this. – Jodrell Apr 28 '11 at 12:16
  • well how would i get the html source code for the page that actually has the video on it and not the continue page?? – daniel11 Apr 28 '11 at 12:32
  • 1
    you'll have to do what the button does when you click it but this is probably obfuscated with a serverside element. I can't provide a generic answer and I don't want to click your link from my current location. – Jodrell Apr 28 '11 at 12:44

3 Answers3

22

This guy answered it very well.

How can I get HTML page source for websites in VB.NET?

This was his code:

Dim sourceString As String = New System.Net.WebClient().DownloadString("SomeWebPage")
Andrei Sfat
  • 8,440
  • 5
  • 49
  • 69
Stephen
  • 236
  • 2
  • 3
0
Dim PictureURL As String = "http://www.bing.com" + New System.Net.WebClient().DownloadString("http://www.bing.com/HPImageArchive.aspx?format=rss&idx=0&n=1&mkt=de-DE").Replace("<link>", "|").Replace("</link>", "|").Split("|")(3)
rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156
Jonas
  • 1
0

I have tried writing something like this in the past and found out that there are bunch of limitations in place (either by browsers or by protocol itself) to prevent automation. Creating an universal website parser will be impossible. You would have to write parsing routines for individual sites, based on the way they hide content from you. You first have to determine pattern of how each of these sites hide the content from user and then implement the actual parsing for each pattern (patterns being either a ling with video destination, or a button that pops up another window with the content video, or a button that executes a javascript that dynamically loads a video into current window)

Dimitri
  • 6,923
  • 4
  • 35
  • 49
  • could i load the video in a custom webbrowser using the webbrowse function and then some how get the source code of the currently loaded page and go from there? this might be more effective than using the url itself to get the source code? – daniel11 Apr 28 '11 at 19:18
  • I was talking about parsing out the source, not the URL. You said in source video is not directly linked to, but instead there's a link to another page where you have to click a button to load the video. I doubt you actually need any kind of browser to read the source of web page. string getPageSource(string URL) { System.Net.WebClient webClient = new System.Net.WebClient(); string strSource = webClient.DownloadString(URL); webClient.Dispose(); return strSource; } – Dimitri Apr 28 '11 at 23:46
  • Sorry for the formatting... or rather lack of it. So in short what my response means is: Does it really worth spending hours and hours writing HTML Source parsers to get the videos? It will be pain to make it fully automated. – Dimitri Apr 28 '11 at 23:47