Open XML standard for industrial and commercial catalogs
-- Draft --
Jean-Marc Vanel
2001-06-26 |
 |
Introduction
The aim of this report is to define OX-SICC, an open standard for industrial
catalogs, using state-of-the-art XML standards and other open standards,
with these guidelines :
-
leverage on XML Schema : use it to specify descriptors for categories,
and inheritance among categories
-
use URI's for items and categories, and set a URI naming scheme for them
-
define a base category, containing roughly the common attributes of Requisite
and Oracle Exchange, and a few fundamental derived categories
Review of existing DTD/XML schema for catalogs
-
Oasis, http://www.xml.org/xmlorg_registry/index.shtml
: OASIS registry of all XML formats link:www.xml.org
943 hits
-
www.biztalk.org link:www.biztalk.org 683 hits
-
Martsoft (www.Martsoft.com), mail sent, seems interesting;
link:www.martsoft.com on altavista: 54 hits
-
Requisite ... link:www.requisite.com on altavista: 109 hits
http://www.ecx-xml.org/ Site for its XML exchange
format
-
saqqara link:www.saqqara.com: 357 hits !!!
seems to be a serious competitor to Requisite !
-
ContentEurope (http://www.contenteurope.com/) provides
cataloging services link:www.contenteurope.com "AltaVista
found no document matching your query" !!!
-
Cataloga (http://www.cataloga.com) link:www.cataloga.com
1 hit no software, services only
-
Liaison http://www.liaison.com
-
TO BE COMPLETED <<<<<<<<<<<
Requirements for XML formats for industrial catalogs
-
readability
-
support validation rules
-
allows for easy generation of queries and searches
-
supports internationalization
-
extensibility: allows adding unforeseen content (e.g. tables, HTML or XML
mark-up) into items and catalogs without breaking existing protocols and
software
-
allows for easy mappings within categories and descriptors (e.g. between
original supplier data and market place data), with splitting or merging
of categories
-
supports mappings between different unique identifiers (part numbers),
e.g. supplier and buyer having different part numbers for the same product
-
support different levels of quality for catalogs (e.g. more or less details,
???)
-
content generation, e.g. :
-
support generation of detailed description from individual descriptors
-
manage mathematical or logical relations between descriptors (e.g. power=tension*intensity*cos
phi ), either as validation rule or as content generation
-
support authoring information
-
compatible as an extension with the main current format standards for catalog
data : possible XSLT transforms with no loss of data from current
standards to OX-SICC, XSLT transforms with possible loss of data from OXSICC
to current standards
-
manage secondary categories for an item (indicating a hybrid product having
two or more uses)
-
manage keywords for items and/or categories
-
manage generic descriptors (e.g. color, weight, power, etc)
-
price lists : manage prices by a delegate software object as a general
dependency or function of : unit price, UOM, quantity, order date, shipping
date, shipping conditions, payment conditions, buyer, supplier, contract
...
-
be able to represent heterogeneous catalogs (different types of items from
different suppliers)
-
possibility to use unambiguously several URI naming schemes for suppliers,
items types, and items
-
possibility for definition of suppliers, items types, items to use either
URI references or inlined elements
Advantages of using W3C's XML Schema for catalogs
-
flexible and powerful mechanism for specifying simple (base) types (restriction,
facets, regular expressions)
-
inheritance (derivation) between types
-
documentation is part of the standard (contrary to DTD)
-
contrary to DTD, XML Schema is simple XML syntax, and thus allows: controlled
extensibility, XSLT transformations (create input or query forms, sample
instance, etc.)
-
mechanism for validation is standardized (concerns validation of items
wrt categories)
-
validation tools exist, even now when the standard is not yet finalized
(XSV, XML Spy, Oracle XMLSchema java package)
-
built-in support of URI's as unique identificator for the root element
of a Schema
-
modular and decentralized construction of Schemas by import and include
Outline of the solution
A category is simply an XML Schema for the corresponding items. A catalog
is a simple container element (<catalog>) for the items. The
hierarchy of categories is defined by an XML Schema <extension>
of
the parent category. Validation of items with respect to the pertinent
category is just standard XML Schema validation. The unique identifier
of a category is the URI of the corresponding schema. The key of an item
within the catalog document is given by its <supplier_part_number>
element, while the URI unique identifier of the item is composed in the
standard way from the catalog document URI and the local key, e.g.:
http://www.IndustrySuppliers.com/catalogs/imperator/2001#xpointer(//item[supplier_part_number='122'])
The examples given in the Annexes are short extracts from the real Schemas.
We provide hereafter hints for the following features :
-
choice of single supplier (not shown in Annex) and aggregated versions
of catalogs and items
-
content generation, using embedded fragments of XSLT
-
inter-descriptor and other complex rules and constraints, using embedded
fragments of Schematron
-
internationalization: several <description> and <long_description>
elements inside an <item>, having different xml:lang
attributes.
-
mappings within categories and descriptors can be specified :
-
merging one or more category A within one another, B: through XML Schema
extension mechanism, declare that A extends B
-
splitting one category A into one or more other categories: can be specified
using an embedded fragment of XSLT
-
manage secondary categories for an item (indicating a hybrid product having
two or more uses): use a URI reference to the secondary categories inside
the <item> element, e.g. :
<item secondary='http://www.IndustrySuppliers.com/catalog/2001/01/gasket'>...</item>
-
manage keywords for items and/or categories: re-use XHTML syntax and semantics
from namespace http://www.w3.org/1999/xhtml , e.g.:
<xhtml:meta name="keywords" lang="en" content="reliable, high
temperature">
-
authoring information: Dublin Core is a standard that can be reused: Creator,
etc., inside <xhtml:meta> elements
-
possibility for definition of suppliers, items types, items to use either
URI references or inlined elements: example:
<item><supplier ref='www.legrand.fr'/>...</item>
-
manage generic descriptors (e.g. color, weight, power, etc.): a library
of reusable generic descriptors can be set as an XML Schema containing
first-level element definitions; it can be very useful for search engines;
here is an example of reusing a generic descriptor in a category definition:
<xsd:element ref="isc:weight" />
-
price lists : manage prices by a delegate software object as a general
dependency or function of : unit price, UOM, quantity, order date, shipping
date, shipping conditions, payment conditions, buyer, supplier, contract
...
This remains to be defined, but a general solution could be to indicate
the URL of a jar file containing an object that does the computation. The
argument to the function will be, for maximum flexibility, an XML string
or corresponding DOM object, those XML Schema will specify all the above
information. So the function signature would just be :
org.w3c.DOM.Document message(org.w3c.DOM.Document doc);
which is a general and re-usable architecture for flexible communication.
URI naming schemes
We recall that these URI are universally unique identifiers, not necessarily
retrievable on the Web. they can be used to make references to the objects
in a lot of contexts.
This concerns companies and organizations, categories and items.
For companies and organizations, the main Web home page address can
be used, without a terminal / , e.g.:
http://www.ibm.com
As we said, the unique identifier of a category is the URI of the corresponding
schema. The naming scheme will be composed of:
-
the URI of company or organization (as above) defining the category
-
a part defined by the company or organization, ending with the english
name in singular form of the category
Example:
http://www.IndustrySuppliers.com/catalog/2001/01/gasket
For items, a URI will be made using :
-
the URI of the supplier (or of the market place)
-
a catalog name or year
-
an XPointer - XPath expression using the primary key(s) dor the item
Example:
http://www.IndustrySuppliers.com/catalogs/imperator/2001#xpointer(//item[supplier_part_number='122'])
Issues
-
Synonyms in URI:
-
several URL's for a single company
-
several URL's for a single item
URI for an item will be "baptized" by the supplier or the market place,
but who will "baptize" the trading organizations and companies ?
-
TO COMPLETE <<<<<<<<<<<<<
Compatibility with existing standards
TO COMPLETE <<<<<<<<<<<<<
Annex 1 - Schema for base category
Note that we use <xsd:unique> to specify uniqueness of <item>
elements with keys <supplier_part_number> and <supplier>
sub-elements.
The actual schema will contain more details, but this is a short (but
valid) sample to show the essentials.
<xsd:schema targetNamespace="http://www.IndustrySuppliers.com/catalog/2001/01"
xmlns:isc="http://www.IndustrySuppliers.com/catalog/2001/01"
xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="qualified"
>
<xsd:complexType name="category">
<xsd:annotation>
<xsd:documentation>Template for
common and mandatory information about a catalog item.
</xsd:documentation>
</xsd:annotation>
<xsd:all>
<xsd:element name="supplier"
type="xsd:string" />
<xsd:element name="supplier_part_number"
type="xsd:string"/>
<xsd:element name="description"
type="xsd:string"/>
<xsd:element name="long_description"
type="xsd:string" />
</xsd:all>
</xsd:complexType>
<xsd:complexType name="catalog">
<xsd:sequence>
<xsd:element name="item" type="isc:category"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="catalog" type="isc:catalog" >
<xsd:unique name="supplier_part_number">
<xsd:selector xpath="item"/>
<xsd:field xpath="supplier"/>
<xsd:field xpath="supplier_part_number"/>
</xsd:unique>
</xsd:element>
</xsd:schema>
Annex 2 - Schema for derived category
<xsd:schema targetNamespace="http://www.IndustrySuppliers.com/catalog/2001/01/gasket"
xmlns ="http://www.IndustrySuppliers.com/catalog/2001/01/gasket"
xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
xmlns:isc="http://www.IndustrySuppliers.com/catalog/2001/01"
elementFormDefault="qualified"
>
<xsd:import namespace="http://www.IndustrySuppliers.com/catalog/2001/01"
schemaLocation="catalog-base.xsd" />
<xsd:element name="gasket" type="gasket">
<xsd:annotation>
<xsd:documentation>
IndustrySuppliers.com's
definition of a gasket item
belonging to an industrial
catalog.</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:complexType name="gasket">
<xsd:complexContent>
<xsd:extension base="isc:category">
<xsd:all>
<xsd:element
name="diameter" type="xsd:float" minOccurs='1' />
</xsd:all>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:schema>
Annex 3 - Sample XML instance of industrial catalog
Note that we have here an instance of a base catalog, which can contain
any type of item. We could have derived a special type of catalog (e.g.
gasket:catalog)
which would contain only gaskets.
<?xml version='1.0' encoding='ISO-8859-1'?>
<catalog
xmlns ='http://www.IndustrySuppliers.com/catalog/2001/01'
xmlns:gasket='http://www.IndustrySuppliers.com/catalog/2001/01/gasket'
xsi:schemaLocation='http://www.IndustrySuppliers.com/catalog/2001/01/gasket
gasket.xsd'
xmlns:xsi='http://www.w3.org/2000/10/XMLSchema-instance'
>
<item xsi:type='gasket:gasket'>
<supplier>Imperator</supplier>
<supplier_part_number>122</supplier_part_number>
<long_description>Very good indeed!</long_description>
<description>Very good!</description>
<gasket:diameter>1.22</gasket:diameter>
</item>
</catalog>
Annex 4 - Glossary and acronyms
Word / acronym |
Definition |
URL |
attribute |
see descriptor |
|
descriptor |
also called attribute |
|
DOM |
Document Object Model |
http://www.w3.org/DOM/ |
Dublin Core |
basic metadata specification; metadata in the sense of "data about
data", that is information such as Author, Title, Date; this different
from the other meaning of specification of structure of data, such as database
schema. |
http://purl.org/DC/ |
Schematron |
XSLT - based technique to check XPath-based rules and report anomalies;
its strength is in its ability to enforce rules involving comparison of
different elements. |
http://www.ascc.net/xml/resource/schematron/
tutorial:
http://www.zvon.org/xxl/SchematronTutorial/General/contents.html |
UML |
Unified Modeling Language |
|
URI |
Uniform Resource Identifier |
http://www.w3.org/Addressing/ |
W3C |
World Wide Web Consortium |
http:www.w3.org |
XPointer |
|
|
XSL, XSLT |
XML Stylesheet Language (for Transforms) |
|
XML Schema |
World Wide Web Consortium's Schema language: manages the "basic" validation:
structure of elements and sub-elements, constraints on the content of elements,
uniqueness and key references. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Annex 5 - References
See also hyperlinks in Annex 4 - Glossary and acronyms.
Industrial
catalogs management - software specification , J.M. Vanel
XML Schema specification:
http://www.w3.org/TR/xmlschema-0/
http://www.w3.org/TR/xmlschema-1/
http://www.w3.org/TR/xmlschema-2/
XML Schema Tutorial
Command-line tool for validating with XML Schema: XSV
Current Status
of XSV: Coverage, Known Bugs
Schematron specification:
http://www.ascc.net/xml/resource/schematron/
Schematron tutorial:
http://www.zvon.org/index.php?nav_id=2
TO COMPLETE <<<<<<<<<<<<<
Annex 6 - expressing validaty rules inside the XML Schema
Here is an example showing how one can restrict the supplier_part_number
to follow the pattern: 3 digits, dash, 4 digits. This XML can be put
inside the <xsd:element name="supplier_part_number"> element.
<xsd:simpleType>
<xsd:restriction
base="xsd:string">
<xsd:length
value="8"/>
<xsd:pattern
value="\d{3}-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>