Pages

Friday, July 8, 2011

Internal DTD subset: Add character entity definitions to your well-formed XML instance

If you use special character entities in a well-formed (e.g., no DTD/Schema) XML instance, the document will no longer be well-formed, because the parser won’t know how to resolve the entity. You must add the entity definitions to the instance in order for the data to parse again.

You can do this by adding an internal DTD subset to the header of your document instance, like so:

<?xml version="1.0"?>

<!DOCTYPE mydoc [
<!ENTITY  ldquo "&#x201C;">
<!ENTITY  rdquo "&#x201D;">
<!ENTITY  nbsp  "&#x00A0;">
<!ENTITY  copy  "&#x00A9;">
]>

That is, you create a doctype at the beginning of your document and add entity definition rules for the characters you use in the document. Your document will again be well-formed and, in the case above, XML-aware browsers should correctly display your Unicode characters.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.