0

Im writing a program in VB.net that consists of three main steps:

STEP 1: Display the source code of a webpage that is streaming a movie on it in textbox1.

STEP 2: highlight the URL to that movie in the source code, and then display just the URL in textbox3.

STEP 3: Download that movie using HttpWebRequest and HttpWebResponse to a user defined directory

The problem is that i dont know how i would go about extracting the URL from the source code effectively. maybe i could try searching the source code for the string ".mp4" or ".avi" or other video extensions, but that would only find the end of the link, how would i highlight the whole link?

For example: if i searched the source code for ".mp4" and there was a URL such as

"http://megavideo.com/g7987bfd0fg.mp4"

then i would only get

"http://megavideo.com/g7987bfd0fg .mp4"

i know there is some way to start at a certain character in a document and go forward or backward a few characters, but the problem arises when you dont know how many characters to go back due to varying lengths of URLs... is there some way that you could search for http:// and then search for .mp4 and then highlight everything in between them?

#EDIT# I also need to be able to feed this URL into another process that will download the file using "httpwebrequest" and "httpwebresponse" so it would be ideal if i could do something like:

textbox3.text = extracted link

Thanks in advance!

daniel11
  • 2,027
  • 10
  • 38
  • 46

2 Answers2

0

What I would do is do a regular expression match to find the string I was looking for.

here's an example for begins with Regex pattern for checking if a string starts with a certain substring?

Community
  • 1
  • 1
Avitus
  • 15,640
  • 6
  • 43
  • 53
0

Your best bet is Regular Expressions. Get the app called RegexBuddy. It will help you write the regular expression for your needs

Try this code

Dim input As String= "Your initial page source that you want to search through"
Dim pattern As String = "http\:\/\/[.]*\.mp4"

Dim rgx As New Regex(pattern, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = rgx.Matches(input)
If matches.Count > 0 Then
   For Each match As Match In matches
      DownloadVideo(match.Value)
   Next   
End If 
Dimitri
  • 6,923
  • 4
  • 35
  • 49
  • ok i have regex buddy and i made a regular expression but it will find all urls in quotes, and theres another one there that will find a url path... in summary i dont really know how to use it. im looking for a regular expression that will isolate a url that starts with "Http://" and ends in ".mp4" and its encased in double quotations. could you get one using regex and tell me how to use it? – daniel11 Apr 27 '11 at 01:52
  • From the top of my head try this: "http\:\/\/"[.]*"\.mp4" – Dimitri Apr 27 '11 at 01:55
  • if the sourcecode is in textbox1 will this highlight the URL or omitt everything else but the URL or ... like how will it display the search result? and if i want to click a button to extract the URL from the source code would i just put exactly what you have in quotes above? or do i need to use a certain class first and then that – daniel11 Apr 27 '11 at 01:59
  • How regex search works is it goes through the input string and collects all matches in an array. The highlight is then implemented as a separate routine. It's not built in in a framework. So, i'm not sure how you'd want to handle multiple matches: do you wanna show them in a text box? you could use a ListBox and add all the matches to it. Then on button click send the selected listbox item to downoad method. – Dimitri Apr 27 '11 at 02:04
  • no problem. you might want to test the pattern in RegexBuddy first to make sure it matches your search criteria – Dimitri Apr 27 '11 at 02:09
  • can you confirm that what you said above was the right regex expression? – daniel11 Apr 27 '11 at 02:10
  • I can not at the moment. I'm not that fluent in regex to say it's 100% correct without regex buddy. you said you have it. Copy the original test and match to the pattern above – Dimitri Apr 27 '11 at 02:23
  • well i tried that and no results came up, yet if i manually open up the source code for that web page and hit ctrl+F and type ".mp4" then one result comes up (the result that i want) maybe im over thinking this... all i want is to search the source code (aka "text1.text") for the keyword ".mp4", copy the rest of the url up to http, and then paste it in textbox3. can you download regex? it only takes like 2 seconds – daniel11 Apr 27 '11 at 02:28
  • Try this pattern instead: http\:\/\/.*\.mp4. And also check this link out to understand regex better: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx – Dimitri Apr 27 '11 at 12:41
  • ok using your code above ive tried to search for something simple and just use ".mp4" as a regex expression because none of the patterns you gave me have worked. i want to make it so that the search results get added to a listbox using the code below: – daniel11 Apr 28 '11 at 11:19
  • it seems i cant post code snippets in a comment , but basicly all i did was changed the "DownloadVideo(match.Value)" to "Listbox1.items.add(match.Value)" but it still didnt work... – daniel11 Apr 28 '11 at 11:21
  • ok i found my problem but it only makes me ask more questions... my problem isnt the regex pattern, its the script that gets the source code of a url. its not giving me the right one therefore no results are coming up. – daniel11 Apr 28 '11 at 11:29
  • RegEx was one of the hardest concepts to understand for me when i was just starting programming. So i'd recommend spending some time reading RegexBuddy's website and other great Regex resources on the net. You'll be using Regex heavily if' you're staying in this field. – Dimitri Apr 28 '11 at 12:19
  • i will thanks for the information. also i have another problem for which i have asked another question on this site. It pertains to the same program i am working on. http://stackoverflow.com/questions/5818116/how-to-get-the-source-code-of-a-html-page-using-vb-net – daniel11 Apr 28 '11 at 12:34