Subscribe

RSS Feed (xml)

Validate an XML Document Against a Schema

An XML schema defines the rules that a given type of XML document must follow. The schema includes rules that define

  • the elements and attributes that can appear in a document.

  • the data types for elements and attributes.

  • the structure of a document, including what elements are children of other elements.

  • the order and number of child elements that appear in a document.

  • whether elements are empty, can include text, or require fixed values.

XML schema documents are beyond the scope of this chapter, but much can be learned from a simple example. This example uses the product catalog first presented in post1.

At its most basic level, XML Schema Definition (XSD) is used to define the elements that can occur in an XML document. XSD documents are themselves written in XML, and you use a separate predefined element (named <element>) in the XSD document to indicate each element that's required in the target document. The type attribute indicates the data type. Here's an example for a product name:

<xsd:element name="productName" type="xsd:string" />

And here's an example for the product price:

<xsd:element name="productPrice" type="xsd:decimal" />

The basic schema data types are defined at http://www.w3.org/TR/xmlschema-2. They map closely to .NET data types and include string, int, long, decimal, float, dateTime, boolean, and base64Binary, to name a few of the most frequently used types.

Both the productName and productPrice are simple types because they contain only character data. Elements that contain nested elements are called complex types. You can nest them together using a <sequence> tag, if order is important, or an <all> tag if it's not. Here's how you might model the <product> element in the product catalog. Notice that attributes are always declared after elements, and they aren't grouped with a <sequence> or <all> tag because order is never important.

<xsd:complexType name="product">
  <xsd:sequence>
    <xsd:element name="productName" type="xsd:string"/>
    <xsd:element name="productPrice" type="xsd:decimal"/>
    <xsd:element name="inStock" type="xsd:boolean"/>
  </xsd:sequence>
  <xsd:attribute name="id" type="xsd:integer"/>
</xsd:complexType>

By default, a listed element can occur exactly one time in a document. You can configure this behavior by specifying the maxOccurs and minOccurs attributes. Here's an example that allows an unlimited number of products in the catalog:

<xsd:element name="product" type="product" maxOccurs="unbounded" />

Here's the complete schema for the product catalog XML:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

   <!-- Define the complex type product. -->
   <xsd:complexType name="product">
      <xsd:sequence>
         <xsd:element name="productName" type="xsd:string"/>
         <xsd:element name="productPrice" type="xsd:decimal"/>
         <xsd:element name="inStock" type="xsd:boolean"/>
      </xsd:sequence>
      <xsd:attribute name="id" type="xsd:integer"/>
   </xsd:complexType>

   <!-- This is the structure the document must match.
        It begins with a productCatalog element that nests other elements. -->
   <xsd:element name="productCatalog">
      <xsd:complexType>
         <xsd:sequence>
            <xsd:element name="catalogName" type="xsd:string"/>
            <xsd:element name="expiryDate" type="xsd:date"/>

            <xsd:element name="products">
               <xsd:complexType>
                  <xsd:sequence>
                     <xsd:element name="product" type="product"
                      maxOccurs="unbounded" />
                  </xsd:sequence>
               </xsd:complexType>
            </xsd:element>
         </xsd:sequence>
      </xsd:complexType>
   </xsd:element>

</xsd:schema>

The XmlValidatingReader class enforces all these schema rules-ensuring the document is valid-and it also checks that the XML document is well formed (which means there are no illegal characters, all opening tags have a corresponding closing tag, and so on). To check a document, you read through it one node at a time by calling the XmlValidatingReader.Read method. If an error is found, XmlValidatingReader raises a ValidationEventHandler event with information about the error. If you wish, you can handle this event and continue processing the document to find more errors. If you don't handle this event, an XmlException will be raised when the first error is encountered and processing will be aborted. To test only if a document is well formed, you can use the XmlValidatingReader without a schema.

The next example shows a utility class that displays all errors in an XML document when the ValidateXml method is called. Errors are displayed in a console window, and a final Boolean variable is returned to indicate the success or failure of the entire validation operation.

using System;
using System.Xml;
using System.Xml.Schema;

public class ConsoleValidator {

    // Set to true if at least one error exists.
    private bool failed;

    public bool Failed {
        get {return failed;}
    }

    public bool ValidateXml(string xmlFilename, string schemaFilename) {

        // Create the validator.
        XmlTextReader r = new XmlTextReader(xmlFilename);
        XmlValidatingReader validator = new XmlValidatingReader(r);
        validator.ValidationType = ValidationType.Schema;

        // Load the schema file into the validator.
        XmlSchemaCollection schemas = new XmlSchemaCollection();
        schemas.Add(null, schemaFilename);
        validator.Schemas.Add(schemas);

        // Set the validation event handler.
        validator.ValidationEventHandler += 
          new ValidationEventHandler(ValidationEventHandler);
            
        failed = false;
        try {

            // Read all XML data.
            while (validator.Read())
            {}
        }catch (XmlException err) {

            // This happens if the XML document includes illegal characters
            // or tags that aren't properly nested or closed.
            Console.WriteLine("A critical XML error has occurred.");
            Console.WriteLine(err.Message);
            failed = true;
        }finally {
            validator.Close();
        }

        return !failed;
    }

    private void ValidationEventHandler(object sender, 
      ValidationEventArgs args) {

        failed = true;

        // Display the validation error.
        Console.WriteLine("Validation error: " + args.Message);
        Console.WriteLine();
    }
}

Here's how you would use the class to validate the product catalog:

using System;

public class ValidateXml {

    private static void Main() {

        ConsoleValidator consoleValidator = new ConsoleValidator();
        Console.WriteLine("Validating ProductCatalog.xml.");

        bool success = consoleValidator.ValidateXml("ProductCatalog.xml",
          "ProductCatalog.xsd");
        if (!success) {
            Console.WriteLine("Validation failed.");
        }else {
            Console.WriteLine("Validation succeeded.");
        }

        Console.ReadLine();
    }
}

If the document is valid, no messages will appear, and the success variable will be set to true. But consider what happens if you use a document that breaks schema rules, such as the ProductCatalog_Invalid.xml file shown here:

<?xml version="1.0" ?>
<productCatalog>
    <catalogName>Acme Fall 2003 Catalog</catalogName>
    <expiryDate>Jan 1, 2004</expiryDate>

    <products>
        <product id="1001">
            <productName>Magic Ring</productName>
            <productPrice>$342.10</productPrice>
            <inStock>true</inStock>
        </product>
        <product id="1002">
            <productName>Flying Carpet</productName>
            <productPrice>982.99</productPrice>
            <inStock>Yes</inStock>
        </product>
    </products>
</productCatalog>

If you attempt to validate this document, the success variable will be set to false and the output will indicate each error:

Validating ProductCatalog_Invalid.xml.

Validation error: The 'expiryDate' element has an invalid value according to
 its data type. [path information truncated] 

Validation error: The 'productPrice' element has an invalid value according to
 its data type. [path information truncated]

Validation error: The 'inStock' element has an invalid value according to its
 data type. [path information truncated]

Validation failed.

Finally, if you want to validate an XML document and then process it, you can use XmlValidatingReader to scan a document as it's read into an in-memory XmlDocument. Here's how it works:

XmlDocument doc = new XmlDocument();
XmlTextReader r = new XmlTextReader("ProductCatalog.xml");
XmlValidatingReader validator = new XmlValidatingReader(r);

// Load the schema into the validator.
validator.ValidationType = ValidationType.Schema;
XmlSchemaCollection schemas = new XmlSchemaCollection();
schemas.Add(null, "ProductCatalog.xsd");
validator.Schemas.Add(schemas);

// Load the document and validate it at the same time.
// Don't handle the ValidationEventHandler event. Instead, allow any errors
/// to be thrown as an XmlSchemaException.
try {
    doc.Load(validator);
    // (Validation succeeded if you reach here.)
}catch (XmlSchemaException err) {
    // (Validation failed if you reach here.)
}

Technorati :

No comments:

Post a Comment

Archives

LocalsAdda.com-Variety In Web World

Fun Mail - Fun in the Mail