Is semantic markup too open-ended?

Question

I am taking a peek at Dive Into HTML5. It seems nice and interesting, but I am puzzled.

In the 1990s, at the time when Netscape was the browser and HTML was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.

HTML5 introduces tags to express "semantic meaning" to the tag itself. This is all fun and games, but I see something very strange in this approach. Technically, the semantics can be very open ended. HTML5 has tags for article, time, navigation bars, footer. Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident <rant> and <nsfw> would be very important tags): ? I thought XML was the strategy to assign semantics to stuff. Nothing forbids you to put an XML chunk under a XHTML div element, and assign a stylesheet to it so to style it properly, or to delegate to the proper viewer the handling of that namespace (for example, when handling RSS or SVG).

In conclusion, I don't understand the reason behind this extensions focused towards semantics, when it's clear that semantic is a very broad topic, which is guaranteed to require a potentially infinite amount of semantic tags. Since I am pretty sure there are clever people at W3C, I think I'm wrong, but I'd like to know why.

address, cite and code are all fully valid HTML5 elements, although cite's meaning has been modified. — Alohci, Nov 01 '09 at 00:24
To loosely summarize the answers given: It depends on your definition of 'semantics'. — Boldewyn, Nov 18 '09 at 16:07

score 26 · Accepted Answer · edited Oct 16 '10 at 15:42

26

Why are tags for article, time, navigation bars, footer useful?

Because they facilitate parsing for text processing tools like Google.

It's nothing about semantics (at least in 'broad' meaning). Instead they just say: here is the body of page (most important text part) and there is the navigation bar full of links. With such an approach you can easily extract just what you need.

edited Oct 16 '10 at 15:42

Peter Mortensen

30,738
21
105
131

answered Nov 01 '09 at 23:01

Ilya Khaprov

2,546
14
23

score 8 · Answer 2 · answered Nov 18 '09 at 16:03

8

I too hate the way that W3C is going with their specs. There are many things that I don't like, and this "semantics" fad is one of them. (Others include taking forever to complete their specs and leaving too many important details for the browsers to implement as they choose)

Most of all I don't like it because it makes my work as a web developer more difficult. I often have to make a choice whether to make the webpage "semantically correct" or "visually/aesthetically pleasing". The latter wins of course, because that is what the users want, but as a result validations start failing and the whole thing gets quite non-semantic (tables for layout and other things).

Another issue at which I frown is that they have officialy declared that the "class" attribute is for semantics, but then they used it for visual presentation selectors in CSS.

Bottom line - DON'T MIX SEMANTICS AND VISUAL REPRESENTATION. If you use some mechanism for describing semantics (like tag names, attribute values, or what not else), then don't use it for funcional/visual purposes and vice versa.

If I would design HTML, I would simply add an attribute "semantic" which could (like the "class" attribute) be added to any tag. Then there would be a number of predefined values like all those headers/footers/articles/quotes/etc.

Tags would define functionality. Basically you could reduce HTML tags to just a handful, like "div", "table/tr/td", "a", "img", "form", "input" and "select". I probably missed a few but this is the bulk. Visual styling would be accomplished through CSS.

This way the three areas - semantics, visual representation, and functionality - would be completely independent and wouldn't clash in real life solutions.

Of course, I don't think W3C is interested in practical solutions...

answered Nov 18 '09 at 16:03

Vilx-

104,512
87
279
422

4

By adding a `semantic` attribute you're essentially validating the usefulness of the semantics effort, but putting it in a different place in the code. This is fine, but the notion that tags should correspond to *functionality* does dismiss the actual purpose of the HTML language, which is to describe *documents*, which are inherently *semantic*, not *functional*. I think this really gives rise to the need for a separate environment for *documents* from *applications* (the latter of which could contain instances of the former). (cont'd) – eyelidlessness Dec 17 '09 at 21:25
1

If you're going to go that route, and represent *applications* with HTML syntax but only a subset of HTML elements, I'd say that it's high time to introduce a whole slew of new widgets, particularly form type widgets. I think this is where the flaw in the idea becomes most evident. HTML is woefully lacking in functional elements, leaving them to be implemented primarily in a client-side script. All of that is fine because HTML today serves as a hybrid between documents and applications. Were this idea pursued, we'd end up with two wildly divergent HTML languages, and broken sites as a result. – eyelidlessness Dec 17 '09 at 21:28
1

Anyway, where your idea misses the mark completely is that it dismisses the value of *predictability* of HTML semantics. Unless your idea is coupled with a defined set of acceptable `semantic` attribute values, the semantics may as well be completely meaningless except to the markup author. And if it *is* coupled with such a set of values, it's pretty much an exact duplication of the HTML5 semantics efforts. In my opinion, the real problem with the HTML5 semantics efforts is that the defined set of semantics is still way too small to accommodate actual real-world use. – eyelidlessness Dec 17 '09 at 21:30
Regarding making choices between semantics and presentation, there is no such choice. All of the semantics, coupled with the CSS standard, are capable of all of the functionality of the semantically erroneous easy-way-out alternatives. Even table-based layouts (whose appeal is lost on me, but to each their own) can be accomplished without table semantics (and to an almost perfect extent, can be worked around to be backwards compatible all the way back to at least IE 6): http://bit.ly/MIrzS. There is no excuse not to use semantic HTML except laziness. – eyelidlessness Dec 17 '09 at 21:35
1

+1 to your first comment about a new environment for applications. HTML/CSS/JS was never meant for what it is (ab)used today. But alas - we cannot change it, or at least it seems that this idea is completely lost on W3C. All they think about is documents, while real world users demand applications. And the web developers are the ones who are in the middle and have to suffer for this. – Vilx- Dec 18 '09 at 09:42
About the values of the semantic attribute - yes, I was thinking of a list of predefined values. But it doesn't have to be a fixed non-changing list, because, as you correctly note, that list will probably be too small anyway. Instead W3C could have a list of **recommended** values, which search engines and other machine processors would honor, and therefore encourage developers to use them. The list however could get new elements every now and then, and it wouldn't invalidate existing documents, make them semantically incorrect, or require a lengthy reworking of the whole standard. – Vilx- Dec 18 '09 at 09:46
And I'd like to disagree about table-based layouts. Unfortunately there ARE things that require a table layout. Most of them have to do with automatic resizing. When the sizes are fixed, there is indeed no problem representing everything with DIVs and SPANs, and it is even easier. If you want an example of such a problem, ask me. – Vilx- Dec 18 '09 at 09:48
Regarding the predefined-but-open-ended list of semantics, I think this rather calls out more for an evolution of the HTML language (and all of its interpreters) to allow for custom tag semantics (without the namespace mess of XHTML). I don't think moving the semantics out of the tag name into an attribute has any real benefit, but I think there is truly a need for an evolving set of semantics, and some kind of a way to determine de-facto standards as they evolve. – eyelidlessness Dec 18 '09 at 18:23
Regarding table layouts, I'm curious if you looked at the link I posted. Table layouts without table semantics can be done, today. That said, I have little problem with flexible-size grid-type layouts without table layout. I'd like to see such an example, and offer to try to reproduce without table layout, but I will be honest and say I won't have the time available to devote to it for some weeks. Good discussion though! – eyelidlessness Dec 18 '09 at 18:27
HTML5 is from WHATWG, independent from the W3C. – Nicolás Dec 23 '09 at 17:53
WHATever. Doesn't change what said. – Vilx- Dec 28 '09 at 10:10
Oh, I just remembered one tiny simple thing which is available only with tables - vertical centering of contents. AFAIK there is no other way. – Vilx- Dec 28 '09 at 10:19
You definitly make some good points. I think theoretically you should be able to style your semantic elements. So you can give a semantic class of headerLinks and this will always look the same. The problem is any decently complex layout in CSS/HTML makes this impossible since you often need to wrap elements in other elements just to get the right layout and suddenly semantics are broken. – Matthew Manela Sep 27 '10 at 02:40

score 4 · Answer 3 · answered Oct 30 '09 at 15:25

4

There is already a lot of semantics in HTML markup in the forms of classes and IDs, of which there is a (near) infinite amount of possibilities of, And everyone has their own way of handling these semantics. One of the goals of HTML5 is to try to bring some structure to this. you will still be able to extend the semantics of tags with classes and ids. It will also most likely make things easier for search engines.

answered Oct 30 '09 at 15:25

GSto

41,512
37
133
184

That's the fact. you should not assign or infer semantics from classes or ids (at least according to the diveintohtml5 section about semantics) – Stefano Borini Oct 30 '09 at 15:29
why not? where else are you going to have any kind of semantic markup (in current HTML)? – GSto Oct 30 '09 at 15:40
well, I think it's because the id is well, an identifier, and the class is an attribution for visual representation. – Stefano Borini Oct 30 '09 at 15:52
semantics in the form of machine readable markup can be added in the form of classes and ids -- see the way it's done at http://en.wikipedia.org/wiki/HCard - this is markup that a search engine, or browser, or browser plugin can consume. – artlung Oct 30 '09 at 19:08
4

Technically, there's no such thing as near infinite. – hasen Nov 18 '09 at 16:07
>Technically, there's no such thing as near infinite Where's the semantic tag when we need it? – Code Silverback Dec 16 '09 at 14:58

score 3 · Answer 4 · answered Nov 18 '09 at 15:46

Look at it from the angle of trying to make statements either about the page, or about objects referenced from the page. If you see a <footer> tag, all you can say is "stuff in here is a footer" and pass it by. As such, adding custom tags is not as generic a solution as adding attributes and allowing people to use their own choice of URIs to specify predicates and optionally values - RDFa wins hands-down because you can express any triple-statement you like from RDF in a page, one way or another.

I agree 100% with your belief in RDFa! It is the semantic web we were promised, RDFa + solid vocabulary = improved searching experiance — J. M. Becker, Sep 21 '12 at 01:45

score 2 · Answer 5 · answered Oct 30 '09 at 18:59

I just want to address one part of your question. You say:

In the nineties, at the time when Netscape was the browser and html was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.

There are a great deal of tags to choose from in html, but the lack of usage does not imply that they are obsolete. In particular the header tags <h1>, etc, and <ul>, <ol> are used to join items into lists in a way I consider semantic. Many people may not use tags semantically, but the effort to create microformats is an ongoing continuation of the idea you consider an artifact of the 1990s. Efforts to make the semantic web be a winner keeps going, despite full-text search and link analysis (in the form of Google) being the winner as far as how to find and understand the web.

It would be great to see an updated version of Google's Web Stats which show "html as she is spoke." But you are right that many tags are underused.

Whether html5 will be successful is an open and interesting question, but the tags you describe as obsolete didn't go anywhere, they were there in HTML 4.01 and xhtml. HTML5 seems to be an effort to solidify what is useful in tags. In the end if html5 gets support in browsers and makes the job of web developers easier, it will succeed. xhtml2 failed because it roundly failed to gain adoption in browsers and did nothing to make the job of web page makers easier. The forces working on html5 seem keenly aware of the failure of xhtml2, and I think are avoiding having html5 suffer a similar fate.

xhtml2 failure had nothing to do with browser support. It failed because it was neither forward nor backward compatible and the W3C did not want to have a radically different competing format to the more popular HTML 5 regardless of any weaknesses or strengths inherent to HTML5. — , Nov 01 '09 at 13:09
Bah, xhtml2 was finally -cancelled- because of html5, but it had failed long before that. Failed to be adopted, failed to be effective. — Kzqai, Jul 29 '12 at 16:59

score 2 · Answer 6 · answered Sep 27 '10 at 02:21

"Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident and would be very important tags): ?"

You use <dialog> to describe conversations or comments. Rant and NSFW are subjective terms therefore it makes sense not to use them.

From what I understand a bunch of experienced web developers did research and looked for what most websites have in common in html. They noticed that most websitse have id="header", id="footer", id="section" and id="nav" tags so they decided that we need HTML tags to replace those id's. So in other words, don't expect them to give you a HUGE amount of HTML vocabulary. Just keep it simple as possible as you can while addressing the MOST common needed HTML tags.

NAV tag is VERY important for providing accessibility as well. You want them to know where the navigation is rather than to force them to find whether links are for navigation or not.

slight correction — it was Ian Hickson using Google’s index of the web to see what the common patters in markup were. — Paul D. Waite, Apr 25 '11 at 10:31

score 1 · Answer 7 · answered Nov 02 '09 at 04:23

I disagree with adding extra tags. If detailed vocabulary were actually import then there could be a different tag name for every word in the dictionary. Additional tags names are not helpful as they may communicate additional meaning to humans, but do nothing to facilitate machine parsing of the language. This is why I don't like the "semantic" tags for HTML5 as I believe this to be slippery slope to providing a vocabulary too complex while only providing a weak solution to a problem not fully addressed.

In my opinion markup language structure data as much as describe it in a tree diagram form. Through parsing of the structure and proper use of semantic conventions, such as RDFa, context can be leveraged to provide specific meaning to otherwise generic tag names. In such as case excessive vocabulary need not exist and structurally redundant tag names, such as footer and aside, could be eliminated. The final objective is to make content faster and more accurate to interpret by both humans and machines simultaneously while using as little code as possible to achieve that result. How that solution is lesser important, except to HTML5.

score 1 · Answer 8 · edited Aug 05 '12 at 02:25

I thought XML was the strategy to assign semantics to stuff.

As far as I know, no it wasn’t. XML allows new languages to be defined which are all parsed in the same way, because they all use the XML syntax.

It doesn’t, of itself, provide any way to add meaning (“semantic” just means “meaningful”) to those languages. And until computers get artificial intelligence, they don’t actually understand meaning, so meaning is just what is agreed between human beings. HTML is the most commonly-used language with agreed meaning of its tags.

As HTML is so common, it’s helpful to add a few meaningful tags to it that are quite general in their application. The new HTML5 tags are aimed at that. The HTML5 spec’s authors could indeed carry on down this route, creating tags for every specific bit of meaning possible, but as they’re not robots, they probably won’t.

<section> is useful, and general enough to be meaningfully applicable in lots of documents. <author-last-name> isn’t. Distinguishing between the two is a judgment call, which is why humans, and not computers, write the spec.

For custom semantics that are too specific to be added to HTML as tags, HTML5 defines microdata.

score 0 · Answer 9 · answered Sep 14 '12 at 16:57

In a word, AJAX. The new tags are meant to support what real-world developers are doing by replacing some of the <div class="sidebar-wrap"><div class="styling-hook"><div><ul class="nav"> type of divitis many websites suffer from. The only <div> left in the HTML5 is the styling hook.

The semantics that get promoted to tags from classes are those that developers have freely adopted en-masse as best practices, given an extended xhtml/css adoption period. Check out the WHATWG developer's edition of the spec's sections pagehere. The document itself is a pleasure, but I won't spoil it if you haven't seen it yet.

One of the less obvious reasons for some decisions made by the W3C is the importance of Webkit. If you look, you can see that they were better than some at taking the current work of the HTML5 Working Group and implementing ideas. They have historically been way out ahead in compliance (see here). The W3C placed a high priority on their (i.e. Android, iPhone, the Googlebot, Chrome, Safari, Dreamweaver, etc.,). Google, framework users, Wordpress/Moveable Type/Joomla! type users and others wanted self contained building blocks, so this is the style we get.

Facebook is modular. Responsive design's grids are modular. Wordpress is modular. Ajax works best with modular page structures. Widgets are modules. Plug-ins are modules. It would seem that we should be trying to figure out stuff like how to apply these tags to make it easier to hook the appropriate elements and activate them in our document/application/info-network hybrid Web 2.0.

In closing, HTML5 is meant to be written as xml (again, see the spec) in order to ensure that tools and machines making ajax requests for a portion of a document will get a well-formed useful response. How awesome in combination with things like media queries for devices like feed readers, braille printers, annotators, etc.,. I see a (near)future where anything with good semantic content is it's own newsfeed automagically! This only happens if developers adopt and write compliant documents.

score 0 · Answer 10 · edited Apr 25 '11 at 10:42

0

I've been reading Andy Clark's book Transcending CSS (page 33).

...,it is now widely accepted that presentational names such as header, left, or red that describe an element's look or position are poor choices.

After reading these lines I asked myself: hey, aren't there elements in HTML5 spec such as header, footer?? Why is footer more semantic ? Andy in his book advocates to use site-info for the ID of the footer div and this makes more sense IMHO. Footer is a presentational name (describes the element's position).

edited Apr 25 '11 at 10:42

Paul D. Waite

96,640
56
199
270

answered Jan 14 '11 at 18:05

Attila Szabo

1

6

“Footer is a presentational name (describes the element's position).” — According to the HTML5 spec, it describes an element that [“typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.”](http://dev.w3.org/html5/spec-author-view/sections.html#the-footer-element) It also says that “Footers don't necessarily have to appear at the end of a section, though they usually do.” – Paul D. Waite Apr 25 '11 at 10:44

Is semantic markup too open-ended?

10 Answers10