This is the best description I found on how the browser deals with HTML and CSS:
The rendering engine will start parsing the HTML document and turn the tags to DOM nodes in a tree called the "content tree". It will parse the style data, both in external CSS files and in style elements. The styling information together with visual instructions in the HTML will be used to create another tree - the render tree.
In general the rendering engine's jobs are:
- Tokenize the rules (breaking the input into tokens AKA Lexer)
- Constructing the parse tree by analyzing the document structure according to the language syntax rules
CSS parser
Unlike HTML, CSS is a context free grammar(with a deterministic grammar).
So we'll have CSS specification defining CSS lexical and syntax grammar,
that the parser applies going through the stylesheet.
The lexical grammar (vocabulary) is defined by regular expressions for each token:
comment \/\*[^*]*\*+([^/*][^*]*\*+)*\/
num [0-9]+|[0-9]*"."[0-9]+
nonascii [\200-\377]
nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
name {nmchar}+
ident {nmstart}{nmchar}*
"ident" is short for identifier, like a class name. "name" is an element id (that is referred by "#" )
The syntax grammar is described in BNF.
ruleset
: selector [ ',' S* selector ]*
'{' S* declaration [ ';' S* declaration ]* '}' S*
;
selector
: simple_selector [ combinator selector | S+ [ combinator selector ] ]
;
simple_selector
: element_name [ HASH | class | attrib | pseudo ]*
| [ HASH | class | attrib | pseudo ]+
;
class
: '.' IDENT
;
element_name
: IDENT | '*'
;
attrib
: '[' S* IDENT S* [ [ '=' | INCLUDES | DASHMATCH ] S*
[ IDENT | STRING ] S* ] ']'
;
pseudo
: ':' [ IDENT | FUNCTION S* [IDENT S*] ')' ]
;
For an extensive description on the browser workflow take a look at this article.