0

I am working on gradle script to go through large css file and scrap out the URLs for images. So far:

def temp = ".post-format background:url(image/goes/here.jpg); {background:  .post-format {background: url(../img/post //formats.png);display:;display:.woocommerce-info:before {background: url()center no-repeat #18919c    }"
def list = temp.findAll(/background:[\s]?url\([^\)]*\)/){ match ->
   match
}

This works but it also takes the 'data:image' file url that we don't need. So, here the temp variable contains both - the good 'image/goes/here.jpg' url and also the one we don't need 'data:image/png[..]'. How would we have to update the regular expression to make it work? If you could also share your rational behind of the correct regular expression to help us better learn regular expressions i would much appreciate. Thank You a lot

latvian
  • 3,161
  • 9
  • 33
  • 62

2 Answers2

1

You can use the negative look ahead mechanism to accomplish what you want. Immediately following the escaped left parenthesis you insert (?!data:image) which means that you must not match that text at that point. So your regex becomes:

/background:[\s]?url\((?!data:image)[^\)]*\)/

You can see the approach illustrated in this rubular. See also How can I find everything BUT certain phrases with a regular expression?

Community
  • 1
  • 1
Peter Alfvin
  • 28,599
  • 8
  • 68
  • 106
  • fantastic! it works like charm....i wasn't enough looking ahead:) Thank you also for pointing to the Rubular. It looks a very neat tool that will come handy in future – latvian Sep 16 '13 at 01:26
0

You didn't specify what language you're using, but if the URL you want is always the first one, just don't do a global match (which is what findAll does, whatever language that is). Most likely, changing temp.findAll to temp.match and assigning the results to a scalar string variable will do it. But please tell us which language.

Adi Inbar
  • 12,097
  • 13
  • 56
  • 69