There are two major classes of XML Database.
XML-enabled. These simply map all XML to a traditional database (e.g. Relational_model).
Native XML. The internal model of such databases is based on XML and the fundamental unit of storage is an XML document.
Note: XML-enabled implies that the database does the conversion itself as opposed to middleware.
Why XML in databases
O Connell (2005, 9.2) states that one of the reasons is that XML is increasingly used for data transport, which has meant that data is extracted from databases and put into XML documents and vice versa . It may be more efficient (in terms of conversion costs) and easier to store the data in XML format.
= Native XML databases =
These databases store XML as either textual data or use an internalized format for faster overall processing. Most Native XML databases also provide support for indexing XML which improves query performance.
The formal definition of a Native XML Database, as previously defined by the XML:DB consortium, states that a Native XML Database...
Defines a (logical) model for an XML document -- as opposed to the data in that document -- and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models are the XPath data model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0.
Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage.
Is not required to have any particular underlying physical storage model. For example, it can be built on a relational, hierarchical, or object-oriented database, or use a proprietary storage format such as indexed, compressed files.
Additionally, many XML databases provide a logical model of grouping documents, called Collections . Many collections can be created and managed at one time. In some implementations, collections can also be laid out in a hierarchical fashion, much in the same way that an operating system s directory structure works.
All XML databases now support at least one form of querying syntax. Minimally, just about all of them support XPath for performing queries against documents or collections of documents. XPath is a simple pathing system that allows you to identify nodes that match a particular set of criteria.
In addition to XPath, many XML databases support XSLT as a method of transforming documents or query results that are being retrieved from the database. XSLT is a declarative language written using an XML grammar. Its purpose is to define a set of XPath filters that will be used to transform documents in part or in whole into other formats including Text, XML, HTML, or Portable Document Format.
Eventually, most XML databases will support XQuery to perform querying. XQuery includes XPath as a node selection method, but extends XPath to provide transformational scaffolding. Its syntax is sometimes referred to as FLWR (pronounced Flower ) because the flow may include the following statements: For , Let , Where and Return
Some XML databases support an API called the XML:DB API (or XAPI) as a form of implementation-independent access to the XML datastore. In XML databases, XAPI is analogous to ODBC for relational databases.
= Choice of Database =
As a general guideline, O Connell (2005, 9.2) states that the two types of XML database lend themselves to different tasks.
XML-enabled = Data-Centric tasks. That is, documents are intended for machine consumption and are characterized by: a regular structure, fine-grained data, and little or no mixed data; e.g. sales orders, flight schedules, online catalogs. Primary capabilities:
Enables the transfer of data between XML files and a traditional (tried and tested) database.
Supports queries involving different views of XML data.
Native-XML = Document-Centric tasks. That is, documents are designed for human consumption and are
manually edited using XML-editing tools; e.g. books, emails, newspaper articles. Primary capabilities:
Fast retrieval of documents - little processing is required to serve the XML documents; they can be served in their original form. Consequently, a native-XML DB can outperform a relational database when retrieving data in its predefined format (just as a Hierarchical_database can).
Support structural or/and order based XML queries e.g. Give me all documents in which the second paragraph contains a bold word.
Store, manage and query schemaless documents and semi-structured data.
Source: Bhargava et al. (2005).
= References =
Bhargava, P.; Rajamani, H.; Thaker, S.; Agarwal, A. (2005) XML Enabled Relational Databases , Texas, The University of Texas at Austin.
Available from:
http://www.cs.utexas.edu/users/dsb/cs387h/XMLDatabases.ppt
Date Accessed: 10th June 2005.
O Connell, S. Advanced Databases Course Notes , Southampton, University of Southampton, 2005
[http://exist.sourceforge.net/ eXist Open Source Native XML Database]
[http://xml.apache.org/xindice/ Apache Xindice]