1

I am pulling in an rss feed from my blog, however I wish to remove the query string that wordpress adds to the end of my images.

So far I have tried a number of regular expressions but so far have not been able to come up with one that will remove the ?w=400&h=222from the content block in the cdata section of the rss feed.

Any ideas guys?

Thanks

[EDIT]

the cdata section of the feed looks like this:

<![CDATA[
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis nec ullamcorper massa. Fusce in nibh nulla, id viverra mi. Aliquam consectetur, nisl eget mattis porta, lorem felis lacinia orci, non malesuada lacus nibh sed dui. Praesent blandit erat id tortor fringilla commodo suscipit urna ultricies. Proin facilisis rutrum ligula ac venenatis.</p>
<div id="attachment_2255" class="wp-caption alignnone"><img src="http://myBlog.files.wordpress.com/2011/10/image.jpg?w=400&#038;h=222" alt="Image" class="size-full wp-image-2255" /><p class="wp-caption-text">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p></div>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis nec ullamcorper massa. Fusce in nibh nulla, id viverra mi. Aliquam consectetur, nisl eget mattis porta, lorem felis lacinia orci, non malesuada lacus nibh sed dui. Praesent blandit erat id tortor fringilla commodo suscipit urna ultricies. Proin facilisis rutrum ligula ac venenatis.</p>]]>
ScampDoodle
  • 286
  • 3
  • 13
  • 1
    Please provide input so we can help you. I have no idea what the cdata section of the rss looks like.. – FailedDev Oct 10 '11 at 15:01
  • You are going down a very dangerous path. Parsing XML with regular expressions is, in general, impossible. You can parse *specific* things but as you've seen with even this simple example, there are odd encoding rules you have to take into account. You would be much better off using an HTML parser to do this. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. – Jim Mischel Oct 10 '11 at 15:41

1 Answers1

1

Not tested, but this would be my first attempt...

\?w=[0-9]+&h=[0-9]+

EDIT: After your edit, I see the input data has changed. Mine answer was based on finding a match to ?w=400&h=222

&#038; is an escape sequence for an ampersand. Try the following if the first does not work...

\?w=[0-9]+&#038;h=[0-9]+
musefan
  • 47,875
  • 21
  • 135
  • 185