0

I would like to create some kind of API where people can include a hidden information inside a website, so that a bot can read the information.

I know it is possible with meta-tags, but I am considering using some kind of individual tag, because then I can use DOM which is a bit more comfortable to work with, and it is easier to read by humans.

Example:

<html>
...
<body>
...
<mytag id="123" foo="bar" bar="foo"></mytag>
...
<mytag id="345" foo="bar" bar="foo"></mytag>
...
</body>
</html>

My question is, if it is possible to make this individual tag somehow conform to the standards, maybe by creating some kind of DTD ?

I would like to support HTML 4.01, XHTML and HTML 5, if possible.

Daniel Marschall
  • 3,739
  • 2
  • 28
  • 67
  • 1
    I will look at this question, however, my question goes in a different direction, as I am searching a possibility to support HTML 4 and XHTML too. So, a HTML5-only solution is not what I am looking for. – Daniel Marschall Jul 07 '18 at 11:39
  • There are many different answers, depending on who you ask. _Technically_, the answer is no, the tag name `mytag` breaks the standards. – Mr Lister Jul 07 '18 at 11:41
  • What do you want to achieve exactly that can't be done with a div with a specific class, data blocks in JavaScript sections in the file, or by using a non-HTML xml file? – Mr Lister Jul 07 '18 at 11:46
  • Thank you for the ideas. I would like that people can output some visual information along with invisible machine readable information. So a pure XML file is not a solution. Javascript data blocks are a bit tricky because I would like to read the machine readable part with PHP, so I would need a JS parser. A span or div with data attributes might be a solution, although not 100% HTML 4. (Can I add some kind of DTD at the head of the HTML4 document to let it know that the data tags are valid?) – Daniel Marschall Jul 07 '18 at 12:18

1 Answers1

1

Having to support HTML 4.01 and HTML5 makes this hard. You can’t use meta-name elements (would work for HTML 4.01, but they have to be registered for HTML5), you can’t use custom data-* attributes (not allowed in HTML 4.01), you can’t use Microdata (only defined for HTML5+), you can’t use custom elements (only defined for HTML5+).

I can think of two ways.

script element as data block

In HTML5, the script element can also be used for data blocks. Examples: text/html, text/plain.

The HTML 4.01 spec doesn’t define it like that, but it should still be possible/valid (it’ll understand it as "script", but user agents are not expected to try to run it if they don’t recognize the content type as possible for scripts).

Drawback: The content is not part of the document’s DOM.

RDFa

It’s allowed in HTML 4.01 and HTML5 (you might have to adapt the DOCTYPE for the older HTML versions, e.g., for XHTML).

You can’t use custom elements, but you can add property and content attributes (for name-value pairs), and you could use typeof for "items" (e.g., what you would use the element name for), and you can make use of meta and link elements (visually hidden by default) in the body.

<div vocab="https://api.example.com/voc#" class="the-hidden-information">

  <div typeof="Item-123">
    <meta property="foo1" content="bar1" />
    <meta property="foo2" content="bar2" />
  </div>

  <div typeof="Item-345">
    <meta property="foo1" content="bar1" />
    <link property="foo5" href="/some-url" />
  </div>

</div>

(when using RDFa 1.0 instead of 1.1, you’d have to use xmlns instead of vocab)

unor
  • 92,415
  • 26
  • 211
  • 360
  • 1
    Thank you very much for your answer and the two suggestions! RDFa sounds very interesting, but it also looks a bit complex (and usually requires a framework for reading them). I think I will go with a JSON data block. This way, I can easily read and write a hierarchical machine readable data structure into the file (I'll locate the ` – Daniel Marschall Jul 07 '18 at 17:49