Pages

Wednesday, May 22, 2013

Internal DTD subset: Add character entity definitions to your well-formed XML instance

If you use special character entities in a well-formed (e.g., no DTD/Schema) XML instance, the document will no longer be well-formed, because the parser won’t know how to resolve the entity. You must add the entity definitions to the instance in order for the data to parse again.

You can do this by adding an internal DTD subset to the header of your document instance, like so:

<?xml version="1.0"?>

<!DOCTYPE mydoc [
<!ENTITY ldquo    "“">
<!ENTITY rdquo    "”">
<!ENTITY nbsp      " ">
<!ENTITY copy      "©">
]>

That is, you create a doctype at the beginning of your document and add entity definition rules for the characters you use in the document. Your document will again be well-formed and, in the case above, XML-aware browsers should correctly display your Unicode characters.



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.