Tuesday, November 30, 2010

Technorati Tags: ,
Recently I cam across an issue in loading xml in XElement.

XElement xml = XElement.Parse(input);

Even though input string was a valid utf-8 based xml, it was failing with following error.


System.Xml.XmlException was unhandled
  Message='.', hexadecimal value 0x00, is an invalid character. Line 1, position 125349.
  Source=System.Xml
  SourceUri=""
  StackTrace:
       at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
       at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
       at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
       at System.Xml.Linq.XElement.Load(XmlReader reader, LoadOptions options)
       at System.Xml.Linq.XElement.Parse(String text, LoadOptions options)

Although, the same string works fine when loaded with XMLDocument and then loading it in XElement (as shown below).

var doc = new XmlDocument();
doc.LoadXml(input);
XElement xml = XElement.Parse(doc.InnerXml);

Funny! Hah..
On doing some googling, I found a few work around.
The best way is to replace any hexadecimal character (shown below).

input = Regex.Replace(input, @"\p{C}+", "");

I hope to come back and find better solution. Meanwhile let me know if there is any better (more performant) way or a fix.

No comments: