I have various HTML documents that I'm trying to extract the links to: (1) other html documents, (2) image files such as .jpg, .png and .bmp. I need a regular expression to do this and cannot seem to figure it out.
Each of the html pages will have code similar to the following:
IMG style="MARGIN-BOTTOM: 20px; MARGIN-LEFT: 20px" align=right src="images/sample001.jpg">
IMG style="MARGIN-BOTTOM: 25px; MARGIN-LEFT: 25px" align=right src="images/sample002.png">
IMG style="MARGIN-BOTTOM: 20px; MARGIN-LEFT: 20px" align=right src="images/sample003.bmp">
href="javascript:parent.POPUP({url:'testDoc001.htm',type:'shared',width:600,height:645})">
href="javascript:parent.POPUP({url:'testDoc002.html',type:'shared',width:700,height:712})">
As an example, the regular expression would operate on the above HTML and produce the resulting array:
images/sample001.jpg
images/sample002.png
images/sample003.bmp
testDoc001.htm
testDoc002.html
Can someone help me out? Thanks so much.