How can I split code-blocks into a list?

Question

I want to split the contents of a CSS file into code blocks and push each block of code into a list using Python 3.5.

So, given this CSS:

h1 {color: #333, background-color: transparent}
h2 {
  font-weight:300
}
h3
{
  font-weight: 200
}

We can clearly tell that it has multiple styles and / or types of indentation meaning the CSS has to be tidied to get this:

h1 {
  color: #333,background-color: transparent;
}

h2 {
  font-weight: 300;
}

h3 {
  font-weight: 200;
}

How can I use Python to read a tidied string of CSS and push every block of code inside it into a Python list like this:

styles = [
  "h1 {\n  color: #333,background-color: transparent;\n}",
  "h2 {\n  font-weight: 300;\n}",
  "h3 {\n  font-weight: 200;\n}"
]

I would also like to point out that RegExp is not really my forte and I'm not quite sure what RegEx to use, but I was thinking that I could use RegExp & [].split(...); together to achieve this.

Possibly even use RegExp to eliminate the need to tidy the stylesheet before splitting the code-blocks in it.

NOTE: I've checked this this question out but unfortunately that didn't help either.

Possible duplicate of [What is the pythonic way to implement a css parser/replacer](http://stackoverflow.com/questions/11592347/what-is-the-pythonic-way-to-implement-a-css-parser-replacer) — pvg, Jun 25 '16 at 14:32
@Mango No need to implement a parser yourself, you can use a small library. I've outlined it in my answer below. — oxalorg, Jun 25 '16 at 14:46
@Mango it actually does, the way you want to solve your problem is akin to this infamous SO answer http://stackoverflow.com/a/1732454/5087125 Don't do it, use a parser, there are small efficient ones that do this simply and properly. — pvg, Jun 25 '16 at 15:12

oxalorg · Accepted Answer · 2016-06-25T18:05:24.797

3

This implementation is done using tinycss, a simple pure python css parser.

This works on untidied css. As long as it is legal.

import tinycss
from collections import defaultdict

parser = tinycss.make_parser('page3')
# use parse_stylesheet_files to read from a file.
stylesheet = parser.parse_stylesheet("""h1 {color: #333; background-color: transparent}
        h2 {
              font-weight:300
        }
        h3
        {
              font-weight: 200
        }
        h1{
        padding: 0px;}
        """)

# Initialize to empty list if key does not exists
# This allows to group multiple blocks with same selectors
temp = defaultdict(list)

for rule in stylesheet.rules:
    for dec in rule.declarations:
       temp[rule.selector.as_css()].append((dec.name, dec.value.as_css()))

print(temp)

Output:

defaultdict(<class 'list'>,
            {'h1': [('color', '#333'),
                    ('background-color', 'transparent'),
                    ('padding', '0px')],
             'h2': [('font-weight', '300')],
             'h3': [('font-weight', '200')]})

See how different h1 blocks got clubbed into a single list. I'm not extremely aware of the intricacies of CSS, but it's easy to prevent this from happening.

This is much more flexible in that it covers ALL edge cases, works with selectors, CSS2, and CSS3, unlike a solution with regular expressions.

Please note: I've pushed everything into a dictionary, but you can easily push it as a list as-well. Let me know if you want something with pure lists, but it should be relatively trivial if you understand what I'm doing.

edited Jun 25 '16 at 18:05

answered Jun 25 '16 at 14:41

oxalorg

2,768
1
16
27

What cases will a regex not cover? Assuming that the CSS is always formatted correctly, it should always work https://repl.it/C5ws/7 – Jacob G Jun 25 '16 at 14:47
@JacobGray assuming that it's correctly formatted. And if isn't, you anyways need a parser, so might as well have a way to solve it without needing to tidy the css. – oxalorg Jun 25 '16 at 14:52
The OP is asking how to split a *formatted* piece of CSS, so I think it is safe to assume that the input is always formatted. Even if the input isn't always formatted, you can still split it regardless of formatting by `re.compile("(})").split(css)`. I just don't see any point in using a library to parse the entire stylesheet when all you want to do is split each rule – Jacob G Jun 25 '16 at 14:58
2

What if there is a SINGLE extra 'whitespace' present, your solution breaks apart completely, that too silently. Comments, brackets inside comments, complete blocks inside comments, tab characters, there are always edge cases and It's definitely a bad idea to use RegEx for something like this. – oxalorg Jun 25 '16 at 15:06
This could be made shorter and clearer with a defaultdict making the comment and if unnecessary. Also, is the b prefix on the string really needed? – pvg Jun 25 '16 at 16:05
I actually used a `defaultdict` in my local copy, but I chose to avoid it to explain clearly what I'm trying to do. I'll add it in the solution eitherways. Yes `tinycss` only works on byte like object. – oxalorg Jun 25 '16 at 16:06
`tinycss` works just fine on unicode strings, just call `parse_stylesheet` instead of the function you're using. It's python 3 so you know you have a unicode literal and none of this extra stuff is needed. – pvg Jun 25 '16 at 17:42
@pvg How could I have been so stupid lol. I missed the documentation right in front of my eyes! Sorry. Fixed the answer. – oxalorg Jun 25 '16 at 18:06
1

@MiteshNinja thank you for the answer, it works brilliantly. – Jul 01 '16 at 18:27

th3an0maly · Answer 2 · 2016-06-26T10:01:49.750

1

You can achieve this with a simple file read and replace:

styles = []
with open('file.css') as file:
    style = []
    for line in file.readlines():
        # If line is empty
        if not line.strip():
            # If a block is non-empty
            if style:
                styles.append("".join(style))
                style = []
        else:
            # Add to the current block
            style.append(line)
    styles.append("".join(style))

Output:

>>> for s in styles: s
h1 {\n  color: #333,background-color: transparent;\n}\n
h2 {\n  font-weight: 300;\n}\n
h3 {\n  font-weight: 200;\n}\n

edited Jun 26 '16 at 10:01

answered Jun 25 '16 at 15:16

th3an0maly

3,360
8
33
54

This code will horribly break even if there's a single extra blank line literally ANYWHERE in the entire css style sheet. – oxalorg Jun 25 '16 at 16:17
@MiteshNinja By "horribly break", I assume you meant there will be empty lines in `styles` (if you meant something else, please clarify). Thanks for pointing that out. Fixed it. – th3an0maly Jun 26 '16 at 10:02
I don't think you understand. Your code will now 'horribly' break if there's a single extra blank line present ANYWHERE except if the newline is between 2 blocks. If there's an extra line present inside a block, it will assume that the block is finished and push it onto `styles`. Please re-check your code. – oxalorg Jun 26 '16 at 11:07

How can I split code-blocks into a list?

2 Answers2