3

i need to get images from a webpage source.

i can use cfhttp method get and use htmleditformat() to read the html from that page, now i need to loop through the content to get all image url's(src)

can i use rematch() or refind() etc... and if yes how??

please help!!!!!

if im not clear i can try to clarify..

Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
loo
  • 707
  • 10
  • 22

3 Answers3

3

It can be very difficult to reliably parse html with regex.

Community
  • 1
  • 1
Antony
  • 3,781
  • 1
  • 25
  • 32
1

Here's a function that will probably trip up on a lot of bad cases, but might work if you just need something quick and dirty.

<cffunction name="getSrcAttributes" access="public" output="No">
    <cfargument name="pageContents" required="Yes" type="string" default="" />

    <cfset var continueSearch = true />
    <cfset var cursor = "" />
    <cfset var startPos = 0 />
    <cfset var finalPos = 0 />
    <cfset var images = ArrayNew(1) />

    <cfloop condition="continueSearch eq true">
        <cfset cursor = REFindNoCase("src\=?[\""\']", arguments.pageContents, startPos, true) />

        <cfif cursor.pos[1] neq 0>
            <cfset startPos = (cursor.pos[1] + cursor.len[1]) />
            <cfset finalPos = REFindNoCase("[\""\'\s]", arguments.pageContents, startPos) />
            <cfset imgSrc = Mid(arguments.pageContents, startPos, finalPos - startPos) />

            <cfset ArrayAppend(images, imgSrc) />
        <cfelse>
            <cfset continueSearch = false />
        </cfif>
    </cfloop>

    <cfreturn images>
</cffunction>

Note: I can't verify at the moment that this code works.

Soldarnal
  • 7,558
  • 9
  • 47
  • 65
  • 1
    Huh? *If* you're going the regex route (see Anthony's answer for why you shouldn't), you just want: ` ` – Peter Boughton May 14 '10 at 14:19
  • I had written this function a while back (before CF8, hence no REMatch) for, like I mention above, something quick and dirty. I make no pretense that it is production code - obviously it doesn't check if src= is even in an img tag (or in a tag at all!) - but not all code has to be. – Soldarnal May 14 '10 at 16:22
  • Peter Boughton: thanks for the code it seemes to pick up only one src attr. if you can modifty it to list all the src... i would appreciate that. i added the #SrcMatches[i]#
    in the loop assuming it will list all src found. #SrcMatches[i]#
    – loo May 14 '10 at 17:22
1

Use a browser and jQuery to 'query' out all the img tag from the DOM might be easier...

Henry
  • 32,689
  • 19
  • 120
  • 221