What is the difference between #PCDATA
and #CDATA
in DTD?
-
1possible duplicate of [what actually is PCDATA and CDATA?](http://stackoverflow.com/questions/857876/what-actually-is-pcdata-and-cdata) – Joshua Drake Jun 26 '15 at 13:16
-
1The names of the keywords used in XML DTDs are `#PCDATA` and `CDATA`. There is no `PCDATA` keyword and no `#CDATA`. – mzjn Dec 29 '16 at 19:28
-
1In addition to the accepted answer you should read https://stackoverflow.com/a/918462/2013911 because it explains the difference between CDATA attribute type and <![CDATA[]]> marked sections. – Niklas Peter Oct 09 '17 at 11:36
7 Answers
PCDATA
is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.CDATA
is text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.
By default, everything is PCDATA
. In the following example, ignoring the root, <bar>
will be parsed, and it'll have no content, but one child.
<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>
When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA
, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<
) , greater-than (>
) , ampersand (&
), quote('
) and double quote ("
).
In the next example, <bar>
contains CDATA
. Its content will not be parsed and is thus <test>content!</test>
.
<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>
There are several content models in SGML. The #PCDATA
content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.
Another type of content model allowing plain text contents is CDATA
. In XML, the element content model may not implicitly be set to CDATA
, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA
type however, entity references are replaced.
In XML, #PCDATA
is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA
content model may be used explicitly through the CDATA
block markup in #PCDATA
, but element contents may not be defined as CDATA
per default.
In a DTD, the type of an attribute that contains text must be CDATA
. The CDATA
keyword in an attribute declaration has a different meaning than the CDATA
section in an XML document. In a CDATA
section all characters are legal (including <
,>
,&
,'
and "
characters), except the ]]>
end tag.
#PCDATA
is not appropriate for the type of an attribute. It is used for the type of "leaf" text.
#PCDATA
is prepended by a hash in the content model to distinguish this keyword from an element named PCDATA
(which would be perfectly legal).

- 27,591
- 48
- 66
- 103

- 61,572
- 58
- 208
- 243
-
6Great answer, except for the last sentence. `#` is not a hashtag. Only a tag preceded by this symbol is a hashtag. The symbol itself has [many names](https://en.wikipedia.org/wiki/Number_sign#Other_names_in_English), including "number sign", "pound sign" (mostly Canada & US), or just "hash" (hence the name 'hashtag'). – Sep 19 '13 at 17:42
-
7
-
3I do not agree that the # in front of `#PCDATA` is there for historical reasons. It is there because in a DTD, an element could also contain an element named `PCDATA`, which must be possible, and which would look like `<!ELEMENT foo (PCDATA)>`. – Mathias Müller Feb 23 '16 at 16:46
-
Quote and double-quote are perfectly legal in PCDATA content. And ampersand may appear, but (in XML) only as an entity introducer. – Toby Speight Nov 28 '19 at 16:42
PCDATA - Parsed Character Data
XML parsers normally parse all the text in an XML document.
CDATA - (Unparsed) Character Data
The term CDATA is used about text data that should not be parsed by the XML parser.
Characters like "<" and "&" are illegal in XML elements.

- 9,068
- 8
- 60
- 88

- 27,253
- 7
- 76
- 97
PCDATA – parsed character data. It parses all the data in an XML document.
Example:
<family>
<mother>mom</mother>
<father>dad</father>
</family>
Here, the <family>
element contains 2 more elements: <mother>
and <father>
. So it parses further to get the text of mother and father to give the text value of family as “mom dad”
CDATA – unparsed character Data. This is the data that should not be parsed further in an xml document.
<family>
<![CDATA[
<mother>mom</mother>
<father>dad</father>
]]>
</family>
Here, the text value of family will be <mother>mom</mother><father>dad</father>
.

- 854
- 1
- 11
- 28

- 2,751
- 1
- 25
- 45
From here (Google is your friend):
In a DTD, PCDATA and CDATA are used to assert something about the allowable content of elements and attributes, respectively. In an element's content model, #PCDATA says that the element contains (may contain) "any old text." (With exceptions as noted below.) In an attribute's declaration, CDATA is one sort of constraint you can put on the attribute's allowable values (other sorts, all mutually exclusive, include ID, IDREF, and NMTOKEN). An attribute whose allowable values are CDATA can (like PCDATA in an element) contain "any old text."
A potentially really confusing issue is that there's another "CDATA," also referred to as marked sections. A marked section is a portion of element (#PCDATA) content delimited with special strings: to close it. If you remember that PCDATA is "parsed character data," a CDATA section is literally the same thing, without the "parsed." Parsers transmit the content of a marked section to downstream applications without hiccupping every time they encounter special characters like < and &. This is useful when you're coding a document that contains lots of those special characters (like scripts and code fragments); it's easier on data entry, and easier on reading, than the corresponding entity reference.
So you can infer that the exception to the "any old text" rule is that PCDATA cannot include any of these unescaped special characters, UNLESS they fall within the scope of a CDATA marked section.

- 235,628
- 64
- 220
- 299
The very main difference between PCDATA and CDATA is
PCDATA - Basically used for ELEMENTS while
CDATA - Used for Attributes of XML i.e ATTLIST

- 104
- 1
- 4
CDATA (Character DATA): It is similarly to a comment but it is part of document. i.e. CDATA is a data, it is part of the document but the data can not parsed in XML.
Note: XML comment omits while parsing an XML but CDATA shows as it is.
PCDATA (Parsed Character DATA) :By default, everything is PCDATA. PCDATA is a data, it can be parsed in XML.

- 72,055
- 26
- 237
- 180
PCDATA
PCDATA: (Parsed Character Data): XML parsers are used to parse all the text in an XML document. PCDATA stands for Parsed Character data. PCDATA is the text that will be parsed by a parser. Tags inside the PCDATA will be treated as markup and entities will be expanded.
In other words you can say that a parsed character data means the XML parser examine the data and ensure that it doesn't content entity if it contains that will be replaced.
Let's take an example:
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
In the above example, the employee element contains 3 more elements 'firstname', 'lastname', and 'email', so it parses further to get the data/text of firstname, lastname and email to give the value of employee as:
vimal jaiswal vimal@javatpoint.com
CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an XML document. Tags inside the CDATA text are not treated as markup and entities will not be expanded.
Let's take an example for CDATA:
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<![CDATA[
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
]]>
</employee>
In the above CDATA example, CDATA is used just after the element employee to make the data/text unparsed, so it will give the value of employee:
<firstname>vimal</firstname><lastname>jaiswal</lastname><email>vimal@javatpoint.com</email>

- 87
- 6