'Annotated XML Example' Specification Version 0.5 Document Revision 4 Copyright © 2011 Codalogic Ltd. IntroductionAnnotated XML Example (AXE™) is a simple method for specifying the format of XML data. In its simplest use an example of your XML data can be used to specify the XML format. The XML example can then be annotated with additional characters to more accurately describe your XML data.AXE™'s main goal is to be easy to use. Therefore it does not support some of the more advanced features supported by other XML data specification languages such as W3C XML Schema. Contents3 - Structure of an AXE Specification File 4.1 - Specifying Attributes from the XML Namespace 5 - Specifying How Many Times an Attribute Can Occur 6 - Specifying How Many Times an Element Can Occur 7 - Specifying Complex Element Bodies 8 - Specifying Groups of Elements 9 - Specifying User-Defined Simple Types 10 - Specifying User-Defined Complex Types 11 - How a User-Defined Complex Type Affects a Referencing Element 12 - Making an AXE Definition with User-Defined Types a Valid XML Document 13 - Specifying an XML Namespace 14 - Marking the Content of an Element as 'Mixed' 1 - Quick OverviewThis section briefly introduces the AXE™ format. The concepts presented here are discussed further later in the document so it doesn't matter if you don't fully understand the example.The following is a brief example of an AXE specification: <MyElement a1="12" a2="?int"> <Element1>This is a string</Element1> * <Element2>string</Element2> ? <Element3>AComplexType</Element3> </MyElement> AComplexType = <_ a3="AnInt">MyInt</_> AnInt = int MyInt = int( min=0, max=100 )As you can see, the format follows that of the actual XML data that it represents, but has some additional characters added to it to describe the format in more detail.
For example, an AXE parser will look at the value of attribute The set of built-in types supported by AXE are described below. They are the set of types specified by W3C XML Schema Part 2.
Similar to above, the type of the
The
The value of the 2 - Identifying AXE FilesThe preferred file extension for an AXE file is.axe . If you add AXE annotations to an XML file it is recommended that you rename the file to include the .axe extension.
3 - Structure of an AXE Specification FileAn AXE specification file consists of zero or more example elements followed by zero or more User-Defined Type definitions. The example elements are the set of elements that the document element of an XML instance document is validated against. A simple AXE specification with one example element and no User-Defined Types is:<MyElement a1="12" a2="?int"> <Element1>This is a string</Element1> </MyElement>The above example indicates that a valid XML instance document must have a document element called MyElement with a mandatory attribute called a1 of integer type, an optional attribute called a2 of integer type and a child element called Element1 of string type.
The following is an example containing two example elements, permitting a valid XML instance to match either the <MyElement a1="12" a2="?int"> <Element1>This is a string</Element1> </MyElement> <YourElement a="?int"> <Child1>15.2</Child1> </YourElement>User-Defined Types are described below (See 9 - Specifying User-Defined Simple Types and 10 - Specifying User-Defined Complex Types). An AXE specification file may optionally also have an AXE wrapper as described in 12 - Making an AXE Definition with User-Defined Types a Valid XML Document. 4 - Specifying TypesThe type associated with an attribute or element can be inferred from an example of the type, or be explicitly stated using a built-in type or a user defined type.The type of an attribute is specified in the attribute's value field, for example:
a1="12"
Or:
a2="int"
The type of an element is specified in the body of the element, for example: <Element1>string</Element1> The built-in types are those of W3C XML Schema Part 2. The common types include:
string , long , unsignedLong , int , unsignedInt , short , unsignedShort , byte , unsignedByte , float , double , boolean
date , time , dateTime , gYearMonth , gYear , gMonthDay , gDay , gMonth , duration
normalizedString , token , NMTOKENS , Name , NCName , NMTOKEN
integer , nonPositiveInteger , negativeInteger , nonNegativeInteger , positiveInteger , decimal
hexBinary , base64Binary
anyURI , language , QName , ID , IDREF , IDREFS , ENTITY , ENTITIES , anySimpleType , anyAtomicType , anyType , NOTATION
To specify a built-in type, put the name of the type in the position described above. The following types can be inferred by example: int , long , double , boolean , date , time , dateTime , gYearMonth , gMonthDay , gDay , gMonth , duration
string type.
Note that an AXE parser may infer an incorrect type when the example of the type is ambiguous. In this case you should explicitly specify the type using a built-in type or a user-defined type.
The above built-in types are all simple types. Simple types may have parameters associated with them to further specify the type. The parameters are placed in brackets after the type name. The parameters are specified using a comma separated list of either the parameter name on its own, or
When a parameter value contains spaces it should be placed in quotes, as in:
AlternativeEnumeration parameter (and its alias AltEnum ) allows additional enumerated values to be associated with a type. The set of valid values of the type becomes the union of the specified base type (as modified by the other specified parameters), plus the enumerated values specified by any AlternateEnumeration parameters. For example, to specify that a range type must have an unsigned integer value or the enumerated value unbounded , you could specify:
range = unsignedInt( altEnum=unbounded )
The authorId = unsignedInt( id=author ) authorRef = unsignedInt( idRef=author )Multiple id sets may be specified, with each id set being given a unique name, for example: authorId = unsignedInt( id=author ) authorRef = unsignedInt( idRef=author ) bookId = unsignedInt( id=book ) bookRef = unsignedInt( idRef=book )During parsing of an XML instance document a parser records an implementation specific reference to the parent of an attribute or element whose type has an id parameter and ensures that the value of the attribute or element is unique within the applicable id set.
An attribute value or element body may contain multiple instances of a simple type, for example: <MyElement>10 15 82</MyElement>This is called a simple type array. The number of instances of the simple type in such an array is specified in square brackets after the type name, for example:
* in the place of the second number indicates that the upper limit is unbounded. The array specification min and max values are separated by two dots (.. ).
Simple type parameters and array specifications can also be applied to user-defined simple types. The contents of an element's body can be specified to be empty using an empty element, for example: <MyElement></MyElement>Or:
<MyElement/>
4.1 - Specifying Attributes from the XML NamespaceAn exception to the above is specifying the use of attributes from the XML namespace, such asxml:lang and xml:id . Such definitions use the name of the attribute to indicate the type, and ignore the attribute value. Therefore, to indicate the use of the xml:lang attribute, do:
<MyElement xml:lang="en"> ... </MyElement> 5 - Specifying How Many Times an Attribute Can OccurBy default an attribute is considered to be mandatory. If the attribute is optional place a? character as the first non-whitespace character of the attribute's value, for example:
<MyElement attr="? int"></MyElement> 6 - Specifying How Many Times an Element Can OccurBy default an element is considered to be required once and only once. If the element is optional (0 or 1 times), place a? character in front of the element specification, for example:
?<MyElement attr="int">string</MyElement>If an element can appear 0 or more times, place a * character in front of the element specification, for example:
*<MyElement attr="int">string</MyElement>If an element can appear 1 or more times, place a + character in front of the element specification, for example:
+<MyElement attr="int">string</MyElement>White space may appear between the annotation character and the start of the element specification, for example: + <MyElement attr="int">string</MyElement>A specific range of occurrences can be specified within a pair of braces, for example: {5,10}<MyElement attr="int">string</MyElement>The minimum number of times the element can occur is specified by the first number within the braces. If the maximum number of times the element can appear is the same as the minimum number, then the braces contain no further content. For example, if the element must appear exactly 5 times, the following can be used: {5}<MyElement attr="int">string</MyElement>If the maximum number of times the element can appear is a finite number, then a comma is placed after the first number, and then the maximum number of times the element can appear is specified, for example: {5,10} <MyElement attr="int">string</MyElement>If the maximum number of times the element can appear is unbounded then a * instead of a number for the maximum number of times the element can appear, for example:
{5,*} <MyElement attr="int">string</MyElement> 7 - Specifying Complex Element BodiesIf the body of an element contains multiple child elements, then by default it is assumed that the elements occur in the sequence they are specified in in the specification. For example:<MyElement a1="12" ?a2="int"> <Element1>This is a string</Element1> * <Element2>string</Element2> ? <Element3 a1="MyInt">MyInt</Element3> </MyElement>would mean that MyElement contains 1 instance of Element1 , followed by 0 or more instances of Element2 , optionally followed by an instance of Element3 . (This mirrors W3C XML Schema xs:sequence .)
If only one of the child elements should appear, then include the <MyElement> <Element1>This is a string</Element1> | {1,6} <Element2>date</Element2> | + <Element3>int</Element3> </MyElement>means that the body of MyElement can contain either a single occurrence of Element1 , or between 1 and 6 occurrences of Element2 , or 1 or more occurrences of Element3 . (This mirrors W3C XML Schema xs:choice .)
If multiple child elements can appear, but in any order, then include the <MyElement> <Element1>This is a string</Element1> ^ {1,6} <Element2>date</Element2> ^ + <Element3>int</Element3> </MyElement>(This mirrors W3C XML Schema xs:all or Relax-NG's interleave.)
8 - Specifying Groups of ElementsChild elements can be specified to appear in groups. This is indicated by placing (round) brackets around the elements forming the group. For example:<MyElement a1="12" ?a2="int"> <Element1>This is a string</Element1> ?( + <Element2>string</Element2> + <Element3 a1="MyInt">MyInt</Element3> ) </MyElement>A group may contain body structure characters (i.e. | characters), for example:
<MyElement a1="12" ?a2="int"> <Element1>This is a string</Element1> ?( + <Element2>string</Element2> | + <Element3 a1="MyInt">MyInt</Element3> ) </MyElement>The number of times the group is allowed to appear is indicated using the same method to specify the number of times an element can appear. For example: <MyElement a1="12" ?a2="int"> <Element1>This is a string</Element1> +( + <Element2>string</Element2> | + <Element3 a1="MyInt">MyInt</Element3> ) </MyElement>indicates that the group can appear 1 or more times, and: <MyElement a1="12" ?a2="int"> <Element1>This is a string</Element1> ( + <Element2>string</Element2> | + <Element3 a1="MyInt">MyInt</Element3> ) </MyElement>indicates that the group can appear once and only once. 9 - Specifying User-Defined Simple TypesUser-defined types are defined after the example elements, if present. They follow aName = Type format. The Name must be an XML Name without any colons. The Type of the user-defined simple type follows the format as described in 4 - Specifying Types.
For example, given: <MyElement> <Element1 a2="AnInt" a3="MyInt">MyOtherInt</Element1> </MyElement> AnInt = int MyInt = int( min=0, max=100 ) MyOtherInt = MyInt( min=0, max=50 ) AnInt becomes an alternative name for the built-in int , MyInt defines an integer limited to the range 0 to 100, and MyOtherInt further restricts MyInt to the number range 0 to 50.
To use a user-defined type in an XML document, place the 10 - Specifying User-Defined Complex TypesUser-defined complex types provide a useful way to decompose your XML document description into more manageable chunks in much the same way as you would use methods and functions to decompose a program's structure. Their use is highly recommended.
Specifying a User-Defined Complex Type is similar to defining a User-Defined Simple Type in that it has the form
In this case the MyComplex = <_ attr="?int"> * <Child1 a3="string">time</Child1> | * <Child2>date</Child2> </_>
To use the type, place the name of the type in the body of the element to which the type is being assigned in the same way you would for a user-defined simple type, for example:
<MyElement>MyComplex</MyElement>This effectively gives a definition for MyElement of:
<MyElement attr="?int"> * <Child1 a3="string">time</Child1> | * <Child2>date</Child2> </MyElement> 11 - How a User-Defined Complex Type Affects a Referencing ElementA User-Defined Complex Type can contribute attributes and/or element body content to an element that references the User-Defined Complex Type.If a User-Defined Complex Type includes attribute definitions then these attributes become part of the referencing element's definition. If the content of a User-Defined Complex Type is empty then the User-Defined Complex Type does not affect the content of the referencing element. If the content of a User-Defined Complex Type is a simple type, then this becomes the simple type content of the referencing element. An element that references a User-Defined Complex Type whose content is a simple type must not define content locally, and any other User-Defined Complex Types referenced by the referencing element must have empty content. If the content of a User-Defined Complex Type is one or more child elements then these child elements are conceptually pasted into the referencing elements definition as a group (See 8 - Specifying Groups of Elements) in place of the reference to the User-Defined Complex Type. For example, if the following AXE definition occurs: <MyElement a1="12"> <Element1>This is a string</Element1> * MyType1 ? MyType2 </MyElement> MyType1 = <_ a3="AnInt"> <T11>int</T11> <T12>string</T12> </_> MyType2 = <_ a4="string"> <T21>int</T21> <T22>string</T22> </_>the effective definition of MyElement is:
<MyElement a1="12" a3="AnInt" a4="string"> <Element1>This is a string</Element1> * ( <T11>int</T11> <T12>string</T12> ) ? ( <T21>int</T21> <T22>string</T22> ) </MyElement>The process by which this effective definition is realised is implementation dependent. For example, in an AXE to W3C XML XSD Schema converter it is recommended that if a User-Defined Complex Type in an element definition is the first specified item of an element's content and it may occur once and only once, then the element is modelled as an XML Schema xs:extension of the User-Defined Complex Type. If the referenced User-Defined Complex Type is not the first specified item of content, or is not specified to occur only once, then the User-Defined Complex Type definition should be treated as if it attributes defined an XML Schema attribute group, and it's content defined an XML Schema model group.
12 - Making an AXE Definition with User-Defined Types a Valid XML DocumentBecause the user defined types appear after the main example XML, without special consideration an AXE document may not be a valid XML document. To address this, the AXE specification can be optionally wrapped in anaxe element, which has a namespace prefix associated with it that is associated with the http://codalogic.com/axe namespace. For example:
<axe:axe xmlns:axe="http://codalogic.com/axe"> <MyElement a1="12" a2="?int"> <Element1>This is a string</Element1> * <Element2>string</Element2> ? <Element3>AComplexType</Element3> </MyElement> AComplexType = <_ a3="AnInt">MyInt</_> AnInt = int MyInt = int( min=0, max=100 ) </axe:axe>This more readily allows the AXE specification to be edited in an XML editor. 13 - Specifying an XML NamespaceAn XML document can be associated with an XML namespace by including an XML namespace declaration in the first element. For example, to associate the default namespace with your document, do:<MyElement xmlns="http://mynamespace.com"> ... </MyElement>To associate a namespace prefix with your document, do: <myns:MyElement xmlns:myns="http://mynamespace.com"> ... </myns:MyElement>In the latter case, elements with names including the namespace prefix will be associated with the specified namespace, and elements with names without a prefix will not be associated with a namespace. 14 - Marking the Content of an Element as 'Mixed'There are two ways to mark the content of an element as 'Mixed'. Firstly you can add themixed attribute from the AXE namespace to the element's definition and set its value to true . For example, assuming the namespace prefix 'axe' is mapped to the AXE namespace then you can do:
<MyElement axe:mixed="true" ...> ... </MyElement>Alternatively you can include the string ##mixed in the body of the element definition, for example:
<MyElement> ##mixed ... </MyElement>Any user-defined complex type that has child elements that is referenced as an immediate child in an element marked as 'mixed' must also be marked as 'mixed'. Similarly, if a user-defined complex type is marked as 'mixed' then any element that references it as an immediate child must also be marked as 'mixed'. 15 - Extensibility and VersioningAn AXE specification has two mechanisms for specifying extensibility and versioning.
You can mark the whole AXE definition as 'open' by including the <axe:axe xmlns:axe="http://codalogic.com/axe" axe:open="true"> ... </axe>If an AXE specification is marked as 'open' then all unknown attributes are ignored, and unknown child elements that appear before any element's end tag are ignored.
Alternatively you can specify specific places where unknown elements and attributes are permitted using the
The normal rules for specifying the cardinality of elements and attributes apply to these constructs. Therefore, if the namespace prefix 'axe' is mapped to the AXE namespace, a typical example of using the <MyElement a1="12" axe:any="?"> <Element1>This is a string</Element1> * <axe:any/> </MyElement> 16 - Modelling PolymorphismSome XML documents model the type of polymorphism found in programming languages such as Java. To model this behaviour thepolymorphic , abstract , base , and selector attributes are available in the AXE namespace.
To indicate that a User-Defined Complex Type is polymorphic, set AXE's MyBase = <_ a4="string" axe:polymorphic="true"> <E1>int</E1> </_>If the polymorphic base type can not be used without being extended, then it can be marked as abstract by setting the AXE abstract attribute to true , for example:
MyBase = <_ a4="string" axe:polymorphic="true" axe:abstract="true"> <E1>int</E1> </_>Types that extend a polymorphic base use AXE's base , and selector attributes. The base attribute specifies the base that the type extends. The selector specifies an x=y pair where the x value is the name of an attribute, and the y value specifies the value that the attribute must be set to in order for the type to be selected, for example:
MyExtension1 = <_ aExtra="string" axe:base="MyBase" axe:selector="aExtra=Ext1"> <E2>string</E2> </_>This indicates that the aExtra attribute must be set to the value Ext1 in order for the parsed type to be treated as MyExtension1 . Thus an example of an element that should be validated against the MyExtension1 code type might be:
<MyElement aExtra="Ext1" a4="This is the start"> <E1>18</E1> <E2>Housing plan</E2> </MyElement>
Note that the attribute specified in the
For compatibility with W3C XML Schema, if the attribute name portion of the If during parsing an XML instance document none of the selectors found in the extensions of the base type are matched then an element is parsed assuming its type is that of the base type. 17 - Modularity of User-Defined TypesTo aid modularity and reuse AXE specifications can be split into multiple modules. A module may be described using multiple files. A module is identified using themodule attribute from the AXE namespace. Typically the value of the module attribute is a URI or reverse domain name notation. For example, if the namespace prefix 'axe' is mapped to the AXE namespace, the following specifies that the file is part of the com.codalogic.schemas.libraryTypes module:
<axe:axe xmlns:axe="http://codalogic.com/axe" axe:module="com.codalogic.schemas.libraryTypes"> authorId = unsignedInt( id=author ) </axe:axe>When a reference to a User-Defined Type is made, if it has no prefix then the type is searched for in the current file. If the reference has a prefix then a search is made amongst the known files that are specified to be part of the module that the namespace prefix is associated with. For example, to reference the above authorId from a separate file you can do:
<axe:axe xmlns:axe="http://codalogic.com/axe" xmlns:ltypes="com.codalogic.schemas.libraryTypes"> <MyElement>ltypes:authorId</MyElement> </axe:axe>Note that specifying that a file is in a particular module does not indicate that all the elements and attributes that it defines are in the namespace of the module. Specifying the namespace of a module and the namespace that elements and attributes are associated with is orthogonal in AXE. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|