2

Let's say I have a css file as shown...

span {
    //whatever
}

.block {
    //whatever
}

.block, .something {
    //whatever
}

.more,
h1,
h2 {
    //whatever
}

I want to extract all class names and put it into an array, but I want to keep the structure, so the array will look like...

["span", ".block", ".block, .something", ".more, h1, h2"]

So there are four items.

This is my attempt...

$homepage = file_get_contents("style.css");

//remove everything between brackets (this works)
$pattern_one = '/(?<=\{)(.*?)(?=\})/s';

//this regex does not work properly
$pattern_two = "/\.([\w]*)\s*{/";

$stripped = preg_replace($pattern_one, '', $homepage);
$selectors = array();
$matches = preg_match_all($pattern_two, $stripped, $selectors);

what is the proper regex to use for pattern 2?

buydadip
  • 8,890
  • 22
  • 79
  • 154
  • `span` is not a class though. Do you want all identifiers? – chris85 Oct 05 '16 at 18:35
  • you might be looking for something like phpquery: https://github.com/punkave/phpQuery – online Thomas Oct 05 '16 at 18:35
  • @Thomas, phpQuery doesn't parse css files and extract the identifiers... you should double check it. – Dekel Oct 05 '16 at 18:36
  • @chris85 yes all identifiers – buydadip Oct 05 '16 at 18:37
  • Are they always contained on one line? – chris85 Oct 05 '16 at 18:37
  • @chris85 not they are not, there is at least one case where they are separated by comma and span multiple lines, otherwise yes. Ill provide an example – buydadip Oct 05 '16 at 18:38
  • 1
    Maybe https://regex101.com/r/uoqJKK/3? – chris85 Oct 05 '16 at 18:40
  • Several options in the linked duplicate, including several true parsers, which you should strongly consider. – ceejayoz Oct 05 '16 at 18:48
  • This captures beginning of all other blocks as well. Like a media query definition otherwise I think it would be better to change it to `^\s*([^{}]+){` @chris85 – revo Oct 05 '16 at 18:50
  • This question obviously is not a duplicate of what you have chosen @ceejayoz – revo Oct 05 '16 at 18:55
  • I vote to re-open this question because the answers on that other question mainly suggest regex even though regex is discouraged for this type of work. – MonkeyZeus Oct 05 '16 at 18:56
  • @revo Care to enlighten me? The linked question shows both regex and parser approaches to parsing a CSS file. – ceejayoz Oct 05 '16 at 19:45
  • @MonkeyZeus Yes, hence my "strongly consider" the parser approach in my comment when closing. https://github.com/sabberworm/PHP-CSS-Parser from the dupe looks like a great option. – ceejayoz Oct 05 '16 at 19:45
  • @ceejayoz Yes. That question asks a *very specific* case of extracting class names where class names should have a particular string in them. It doesn't ask about extracting all class names in general and it doesn't ask about extracting multiple class names following a CSS block. Therefore while it is similar, it's not a duplicate. Requirements are very different. – revo Oct 05 '16 at 20:03
  • @revo That's a really weird interpretation of our duplicate rules. The underlying task is the same - look through a CSS file for classes, and the answers in it address this question's specific question (especially the parser ones). Would you oppose closing a "how do I add 1+1 in PHP" with a "how do I add 2+2 in PHP" question? – ceejayoz Oct 05 '16 at 20:07
  • @ceejayoz Since this question is tagged with `regex` and we are talking about it, comparing a *specific case* with a generic solution differs a lot. Your example should be: *Would you oppose closing a "how do I add 1+1 in PHP" with a "how do I add 2 to square root of square in PHP" question?* and I'd say Yes. – revo Oct 05 '16 at 20:19

1 Answers1

5

Like this?

<?php
$css = "span {
    //whatever
}

.block {
    //whatever
}

.block, .something {
    //whatever
}

.more,
h1,
h2 {
    //whatever
}";

$rules = [];

$css = str_replace("\r", "", $css); // get rid of new lines
$css = str_replace("\n", "", $css); // get rid of new lines

// explode() on close curly braces
// We should be left with stuff like:
//   span{//whatever
//   .block{//whatever
$first = explode('}', $css);

// If a } didn't exist then we probably don't have a valid CSS file
if($first)
{
    // Loop each item
    foreach($first as $v)
    {
        // explode() on the opening curly brace and the ZERO index should be the class declaration or w/e
        $second = explode('{', $v);

        // The final item in $first is going to be empty so we should ignore it
        if(isset($second[0]) && $second[0] !== '')
        {
            $rules[] = trim($second[0]);
        }
    }
}

// Enjoy the fruit of PHP's labor :-)
print_r($rules);
MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • Shortcut `$css = str_replace(array("\r", "\n", "\t", " ") , "", $css);` , or even `$css = preg_replace('/\s+/', '', $css);` – chris85 Oct 05 '16 at 18:53
  • @chris85 Excellent suggestion but I wanted to keep it verbose for newbie Googlers :-) – MonkeyZeus Oct 05 '16 at 18:54
  • @MonkeyZeus this basically works, but just one small issue, the spaces are removed so the css identifier `.block .something` becomes `.block.something` for example. I don't mean to be anal but this is sort of an issue – buydadip Oct 05 '16 at 19:03
  • @Bolboa Another thing to consider is that this does not handle comments outside of the curly braces so something can be done in `$rules[] = trim($second[0]);` but I am not going to figure it out right now. Try my code against BootStrap's CSS file and you will see what I am talking about. – MonkeyZeus Oct 05 '16 at 19:05
  • @MonkeyZeus yeah it does not consider `keyframes` and such but I altered your code to fix it. Thanks – buydadip Oct 05 '16 at 19:06
  • @Bolboa I am not sure how those should be handled or presented. I would imagine the same issue exists with `@media` queries. – MonkeyZeus Oct 05 '16 at 19:07
  • @Bolboa If you figure out these exceptions then please feel free to update my answer and let me know. I am sure this will help future visitors. – MonkeyZeus Oct 05 '16 at 19:11
  • @MonkeyZeus sure but I do not know how valid of an answer it is. It works for my case but might not work for all cases if you know what I mean. I just new that all `keyframes` start with an `@` symbol and that keyframes contain things such as `%0 { }` or `from {}` so I removed them based on these patterns – buydadip Oct 05 '16 at 19:31