1

Scenario

Extracting URLs from multiple CSS url() functional notation. Given the following css value from project bootstrap:

src: url('../fonts/glyphicons-halflings-regular.eot?#iefix') format('embedded-opentype'), url('../fonts/glyphicons-halflings-regular.woff2') format('woff2'), url('../fonts/glyphicons-halflings-regular.woff') format('woff'), url('../fonts/glyphicons-halflings-regular.ttf') format('truetype'), url('../fonts/glyphicons-halflings-regular.svg#glyphicons_halflingsregular') format('svg');

I need an array of URL strings like ["../fonts/glyphicons-halflings-regular.eot?#iefix", "../fonts/glyphicons-halflings-regular.woff2", ...].

Solution

Right now I use a while loop and regex matcher to extract URL strings from the parsed declaration via a CSS parser:

var decValue = "url(foo), url(bar)";
// no data URI
var regexp = new RegExp(/url\(([^\(|^\)|^\"|^\']+)\)/g),
    matches = [],
    match;

while (match = regexp.exec(decValue), match !== null) {
    if (match[1].indexOf("data:") === -1) {
        matches.push(match[1]);
    }
}

// should print ["foo", "bar"]
console.log(matches);

Question

Is there a way to do this without using a while loop but keeping group matching?

Nick Bartlett
  • 4,865
  • 2
  • 24
  • 37
shawnzhu
  • 7,233
  • 4
  • 35
  • 51

2 Answers2

3

To avoid a while loop you could move the grouping logic out in to a map function and use String.prototype.match to grab all matches globally:

var decValue = "url(foo), url(bar)";

// no data URI
var urlExp = /url\([^)]+\)/g,
    uriExp = /\(["']?([^)]+)["']?\)/;

var matches = decValue.match(urlExp).map(function (url) {
    return uriExp.exec(url)[1];
});    

// should print ["foo", "bar"]
console.log(matches);

Unfortunately requires you to break up your regex, but it's more or less just group extraction via an iterator pattern.

Nick Bartlett
  • 4,865
  • 2
  • 24
  • 37
  • Your first regex is pretty broken. I think you meant `/url\([^()"']+\)/g`. +1 for showing how to use `match` + `map`, but -1 until that regex is fixed. – Phrogz Apr 28 '15 at 20:31
  • 1
    @Phrogz thanks for the comment, I was too quick to copy the regex from the question. I believe `/url\([^)]+\)/g` works here, and then you can strip optional quotes in the second regex. For some reason the regex you posted [doesn't seem to work](https://regex101.com/r/tH2vY3/1). Here's this one: https://regex101.com/r/bV6yR0/1 – Nick Bartlett Apr 28 '15 at 21:14
  • You're right that my translation of the intention of the regex does not work for quoted urls, and that if you do not mind including the quote characters then what you have now works just fine. +1 for showing that `match` with a `g`lobal flag will gather an array of results. – Phrogz Apr 29 '15 at 00:01
2

Because javascript doesn't support lookbehinds, you have to get creative when the text behind a position matters but you don't want to capture it. This regex accounts for single and double quotation marks.

Here's one method that uses replace to push to an array, it doesn't modify the original variable.

var css = "src: url('../fonts/glyphicons-halflings-regular.eot?#iefix') format('embedded-opentype'), url('../fonts/glyphicons-halflings-regular.woff2') format('woff2'), url(\"../fonts/glyphicons-halflings-regular.woff\") format('woff'), url(../fonts/glyphicons-halflings-regular.ttf) format('truetype'), url('../fonts/glyphicons-halflings-regular.svg#glyphicons_halflingsregular') format('svg');"

var garnerURLs = [];

css.replace(/url\(('[^']*'|"[^"]*"|[^\)]*)\)/g,function(match,p1) {
if (p1.charAt(0) == "'" || p1.charAt(0) == "\"") {
    garnerURLs.push(p1.substr(1,p1.length-2))
} else {
    garnerURLs.push(p1)
}        
});

console.log(garnerURLs)

console.log(css)
url\(('[^']*'|"[^"]*"|[^\)]*)\)

Explanation:

 url                # Literal url
 \(                 # Literal (
 (                  # Opens CG1
     '              # Literal '
     [^']*          # Negated Character class (excludes the characters within)
     '              # Literal '
 |                  # Alternation (CG1)
     "              # Literal "
     [^"]*          # Negated Character class (excludes the characters within)
     "              # Literal "
 |                  # Alternation (CG1)
     [^\)]*         # Negated Character class (excludes the characters within)
 )                  # Closes CG1
 \)                 # Literal )

Any of the answers here, along with your current method, essentially involve a loop. Even if javascript supported lookbehinds, a simple match() method still loops over the list while it finds further matches. In the end, any of these solutions is a fine solution.

Regular Jo
  • 5,190
  • 3
  • 25
  • 47
  • Note: [CSS URLs do not have to be quoted](http://stackoverflow.com/questions/2168855/is-quoting-the-value-of-url-really-necessary). – Phrogz Apr 28 '15 at 20:11
  • Indeed, fixed. Thanks. – Regular Jo Apr 28 '15 at 20:22
  • a string replace with [function as secondary parameter](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#Specifying_a_function_as_a_parameter) is an alternative of while loop. Thanks! – shawnzhu Apr 28 '15 at 20:49