DTDs provide a mechanism for referencing external entities of arbitrary formats, thus allowing SGML and XML files to link to any file with a URI without creating a custom mechanism for that. So, for example, one could specify in a DTD:
<!ELEMENT img EMPTY>
<!ATTLIST img src ENTITY #REQUIRED>
<!NOTATION gif PUBLIC "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN" "image/gif">
<!ENTITY myimg1 SYSTEM "img1.gif" NDATA gif>
<!ENTITY myimg2 SYSTEM "img2.gif" NDATA gif>
<!ENTITY myimg3 SYSTEM "img3.gif" NDATA gif>
When creating an img
element, one could then use a value like myimg1
and the application working with the document should be informed that file img1.gif
is referenced, with a specific format.
The way I understand it, there are three advantages to this:
- Standardization. Regardless of any actual schema in use, an application could be made to find out everything the document links to, even though it may not understand it. This might be useful for security, searching, filtering etc.
- Avoiding repetition. The entity URI is defined only once, but it can be referred to many times.
- Specifying the format (notation) alongside the entity. In case the system doesn't provide or know the format, or there are multiple formats or displaying methods to choose from (show or download for example), there is no need to clutter the document with this information.
Yet, so far I wasn't able to find any dataset or application which would predominantly use this mechanism. In practice, all these points are defeated:
- The vast majority of resources are still linked to in a schema-specific way, like in XHTML. XLink is used for standardized linking to resources in the XML way. XML Schema defines
anyURI
so links can be still automatically found (there is a difference between embedding and linking to a resource though). - Internal parsed entities can already provide a way to reuse a URI in any place in the document. Compression further reduces the need to care about larger documents in datasets.
- The most widely used HTTP provides means for specifying or negotiating the format of the target file. This has an advantage that the server is not locked to storing the file only in the specific format; it could for example upgrade to a better format for images (i.e. PNG over GIF) without the need to modify any document that refers to it.
All tutorials about this mechanism I've found simply state what this can be used for (mostly copying paragraphs from other documents) with examples of custom DTDs like the one above. Additionally, since an entity like this can only be included in an attribute, it can never actually be considered a part of the content of any element and its processing is always dependent on the application.
Is there a system using or relying on external entities and notations? Are there applications that recognize entities used this way and are able to understand notations? What kind of public IDs for notations can I use reasonably, and what are some real-world examples of system IDs? And are there common public IDs for entities or notations?