Unicode.org defines a Byte Order Mark (BOM) "can be used as a signature defining the byte order and encoding form" for text files. Most Unicode-aware applications, like Notepad, will insert a BOM into XML files.
However, not all XML editors properly support Unicode. For example, the popular XML Editor, Cooktop, behaves unexpectedly when editing XML files with different character encodings. Older parsers, such as Apache's Crimson, will throw errors if the character encoding of a BOM is detected. These errors can range from such inaccurate messages as "Document root element is missing" to "Content not allowed in prolog."
One quick way to identify the character encoding is to open the XML file in a hexidecimal editor, such as UltraEdit, and examine the first few bytes of the file.
Byte order mark......Encoding
FE FF................UTF-16, big-endian
FF FE................UTF-16, little-endian
EF BB................UTF-8
We suggest you upgrade your parser(s) and invest in applications that fully support Unicode. For more information about XML and character encoding, refer to the XML specification or the Unicode.org website.
www.unicode.org/faq/utf_bom.html
www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.