HTML was never intended to convey media, and therefore never intended
for any kind of marketing or merchandising. HTML was only intended to
convey text and to provide some sort of descriptive structure upon the
text it was describing. The people originally using HTML professors and
scientists who needed the ability to describe their communications in
more depth and line breaks and quotes would allow. In other words HTML
was only intended to be a document storage mechanism. Keep in mind
there were no web browsers at this time.
HTML was first made popular with the release of the web browsers.
Initially web browsers were just text parsers that provided a handy GUI
for navigating the hyperlinking between documents, but this changed
almost immediately. At this time there was still no actual standard for
HTML. There was the list of tags and a description of mechanisms
initially created for HTML, which along with an understand of SGML, was
all that was required to create an HTML parser.
With web browsers came the immediate demand to extend HTML in ways HTML
never intended. It was at this point that the inventors and original
users completely lost control of the web. Tags were added, such as
center and font, and tables became the primary mechanism for laying
things out on a page instead of describing data. Web browsers supplied
a media demand completely orthogonal to the intentions of HTML.
Marketing people, being what they are, care very much for the appearance
and expressive nature of communications and don't give a crap for the
technology which makes such communication possible. As a result parsers
became more lax to accommodate the incompetent. You have to understand
that HTML was already lax because there were no standard parsing rules
and SGML, due to being so very obtuse, encourages a lax nature outside
of parsing instruction tags.
Its not that these early technology pioneers were stupid, although its
easy to argue the contrary, they simply had other priorities. When the
web went mainstream there was an immediate obsession to conquer specific
business niches in this new medium. All costs were driven towards
marketing, market share, traffic acquisition, and brand awareness. Many
web businesses operate today with similar agendas, but today's web is
not a fair comparison. In the 90s marketing was all that mattered and
technology costs were absolutely ignored. The problem was so widespread
and the surge of investment so grand that it completely defied all
rational rules of economics. This is why there was an implosion. The
only web businesses that survived this crash were those that confronted
their technology costs up front or those who channeled investment monies
into technology expenses opposed to additional marketing expense.
http://en.wikipedia.org/wiki/Dot-com_bubble
After the crash things changed. Consider the crash good timing, because
although it was entirely driven by bad business decisions, foolish
investments, and irrational economics there was positive technology
developments going on behind the scenes. The founders of the web were
completely aware that they had lost all control of their technology.
They sought to solve this problem and set things straight by creating
the World Wide Web Consortium (W3C). They invited experts and software
companies to participate. Although solving many of the technology
problems introduced to the web by marketing drivin motivations was a
lost cause many future problems could be avoided if the language were
implemented in accordance with an agreed upon standard. It was during
this time that HTML 2 (the first standard form of HTML), HTML 3, and
HTML 4 were written.
At the same time the W3C also began work on XML, which never intended to
be a HTML replacement. XML was created because SGML was too complex. A
simple syntax based upon similar rules was needed. XML was immediately
written off by marketing people and was immediately praised by data
evangalists at Microsoft and IBM. Because the holy wars around XML were
trivial, insignificant, and short lived compared to such problems
plaguing HTML XML's developement occurred at rocket speed. Almost
immediately after XML was formed the first version of XML Schema was
formed.
XML Schema was an extradinary work that most people either choose to
ignore or take for granted. An abstration model for accessing the
structure of HTML was also standardized based upon XML Schema, know as
the Document Object Model (DOM). It is important to note that the DOM
was initially developed by browser vendors to provide an API for
JavaScript to access HTML, but the standard DOM released by the W3C had
nothing to do with JavaScript directly.It quickly became obvious that
many of technology problems plaguing HTML could be solved by creating an
XML compliant form of HTML. This is called XHTML. Unfortunately, the
path of adoption from HTML to XHTML was introduced in a confused manner
that is still not widely understood years after clarification finally
occurred.
So, there was a crash and leading up to this period of economic collapse
there were some fantastic technology developments. The ultimate source
of technology corruption, the web browsers, were finally just starting
to innovate around adoption of the many fantastic technology solutions
dreamed up at the W3C, but with the crash came an almost complete loss
of development motivation from the browser vendors. At this time there
was only really Netscape, IE, and Opera. Opera was not free software,
so it was never widely adopted, and Netscape went under. This
essentially left only IE and Microsoft pulled all their developers off
IE. Years later development on IE would be revived when competition
arose from Firefox and when Opera adopted free licensing.
About the same time that browsers were coming back to life the W3C was
moving forward with development of XHTML2. XHTML2 was an ambitious
project and was not related to XHTML1, which created much confusion.
The W3C was attempting to solve technology problems associated with HTML
that had been allowed to fester for long and their intentions were valid
and solid. Unfortunately, there was some contention in the XHTML2
working group. The combination of failed communication on how and why
to transition from HTML to XHTML in combination with the unrelated
nature of XHTML2 and its infighting made people worry.
The marketing interference that allowed the web to crash regressed with
the web crash, but it did not die. It was reviving during this period
as well. Let's not forget that marketing motivations give dick about
technology concerns. Marketing motivations are about instant
gratification. All flavors of XHTML, especially XHTML2, were an
abomination to instant gratification. XHTML2 would eventually be killed
for a single draft was published. This fear and disgust lead to the
establishment of separate standards body whose interests were aligned
with moving HTML forward in the nature of instant gratification
silliness. This new group would call itself WHATWG and would carry the
marketing torch forward.
The WHATWG was united, because their motivations were simple even if
their visions of the technology were ambitious, essentially to make it
easier for developers to make things pretty, interactive, and reduce
complexity around media integration. The WHATWG was also successful,
because the web began to contract since the crash. There were fewer
major players around and each had a specific set of priorities that were
more and more inalignment.
The web is a media channel and its primary business is advertising.
Web businesses that make money from advertising tend to be significantly
larger than web businesses that make money from goods or services. As a
result the priorties of the web would eventually become the priorities
of media and advertising distribution. For instance why did JavaScript
become much faster in the browser? The answer is because Google, an
advertising company, made it a priority to release a web browser that
was significantly faster at processing JavaScript. To compete other
browsers would need to become 20 to 30 times faster to keep up. This is
important because JavaScript is the primary means by which advertisement
metrics are measured, which is the basis of Google's revenue.
Since HTML5 is a marketing friendly specification it allows a lax
syntax. Browser vendors are economically justified to spend more money
writing more complex parsing mechanisms against sloppy markup, because
it allows more rapid developement by which media is published so as to
allow deeper penetration of advertising. This is economically qualified
because all of the 5 major web browsers available now are primarily
funded from advertising revenue. Unfortunately, this is nothing but
cost for anybody else that want's to write a parser and is limiting or
harmful to any later interpretation of structured data. The result is
a lack of regard for the technology and the rise of hidden costs with
limits upon technology innovation within the given medium.
This is why HTML syntax continues to be shit. The only solution is to
propose an alternate and technologically superior communication medium
that technologically emphasizes a decentralization of contracting market
concerns.