1 Introduction to Oracle XML DB

This chapter introduces the features and architecture of Oracle XML DB. It contains these topics:

Overview of Oracle XML DB

Oracle XML DB is a set of Oracle Database technologies related to high-performance handling of XML data: storing, generating, accessing, searching, validating, transforming, evolving, and indexing. It provides native XML support by encompassing both the SQL and XML data models in an interoperable way. Oracle XML DB is included as part of Oracle Database starting with Oracle9i Release 2 (9.2).

Oracle XML DB includes the following features:

  • An abstract SQL data type, XMLType, for XML data.

  • Enterprise-level Oracle Database features for XML content: reliability, availability, scalability, and security. XML-specific memory management and optimizations.

  • Industry-standard ways to access and update XML data. The standards include the SQL/XML standard and the World Wide Web Consortium (W3C) XML and XML Schema data models and recommendations for XPath and XQuery. You can use FTP, HTTP(S), and WebDAV to move XML content into and out of Oracle Database. Industry-standard APIs provide programmatic access and manipulation of XML content using Java, C, and PL/SQL.

  • Ways to store, query, update, and transform XML data while accessing it using SQL.

  • Ways to perform XML operations on SQL data.

  • Oracle XML DB Repository: a simple, lightweight repository where you can organize and manage database content, including XML content, using a file/folder/URL metaphor.

  • Ways to access and combine data from disparate systems through gateways, using a single, common data model. This reduces the complexity of developing applications that must deal with data from different stores.

  • Ways to use Oracle XML DB in conjunction with Oracle XML Developer's Kit (XDK) to build applications that run in the middle tier in either Oracle Application Server or Oracle Database.

Oracle XML DB Architecture

Figure 1-1 and Figure 1-2 show the software architecture of Oracle XML DB. The main features are:

  • Storage of XMLType tables and views.

    • You can index XMLType tables and views using XMLIndex, B-tree, and Oracle Text indexes.

    • You can store data that is in XMLType views in local or remote tables. You can access remote tables using database links.

  • Oracle XML DB Repository. You can store any kind of documents in the repository, including XML documents that are associated with an XML schema that is registered with Oracle XML DB. You can access documents in the repository in any of the following ways:

    • HTTP(S), through the HTTP protocol handler

    • WebDAV and FTP, through the WebDAV and FTP protocol handlers

    • SQL, through Oracle Net Services, including Java Database Connectivity (JDBC)

  • Support of XML data messaging using Oracle Streams Advanced Queuing (AQ) and Web Services.

Figure 1-1 XMLType Storage and Oracle XML DB Repository

Description of Figure 1-1 follows
Description of "Figure 1-1 XMLType Storage and Oracle XML DB Repository"

Figure 1-2 XMLType Storage

Description of Figure 1-2 follows
Description of "Figure 1-2 XMLType Storage"

APIs for XML

Table 1-1 lists the reference documentation for the PL/SQL, C, and C++ Application Programming Interfaces (APIs) that you can use to manipulate XML data. The main reference for PL/SQL, C, and C++ APIs is Oracle Database PL/SQL Packages and Types Reference.

See Also:

Oracle Database XML Java API Reference for information about Java APIs for XML

Table 1-1 APIs Related to XML

API Documentation Description

XMLType

Oracle Database PL/SQL Packages and Types Reference, Chapter "XMLType", Oracle Database XML C API Reference, and Oracle Database XML C++ API Reference

PL/SQL, C, and C++ APIs with XML operations on XMLType data – validation, transformation.

Database URI types

Oracle Database PL/SQL Packages and Types Reference, Chapter "Database URI TYPEs"

Functions used for various URI types.

DBMS_METADATA

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_METADATA"

PL/SQL API for retrieving metadata from the database dictionary as XML, or retrieving creation DDL and submitting the XML to re-create the associated object.

DBMS_RESCONFIG

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_RESCONFIG"

PL/SQL API to operate on a resource configuration list, and to retrieve listener information for a resource.

DBMS_XDB

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDB"

PL/SQL API for managing Oracle XML DB Repository resources, ACL-based security, and configuration sessions.

DBMS_XDB_ADMIN

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDB_ADMIN"

PL/SQL API for managing miscellaneous features of Oracle XML DB, including the XMLIndex index on the Oracle XML DB Repository.

DBMS_XDBRESOURCE

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDBRESOURCE"

PL/SQL API to operate on repository resource metadata and contents.

DBMS_XDBT

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDBT"

PL/SQL API for creation of text indexes on repository resources.

DBMS_XDB_VERSION

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDB_VERSION"

PL/SQL API for version management of repository resources.

DBMS_XDBZ

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XDBZ"

Oracle XML DB Repository ACL-based security.

DBMS_XEVENT

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XEVENT"

PL/SQL API providing event-related types and supporting interface..

DBMS_XMLDOM

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLDOM"

PL/SQL implementation of the DOM API for XMLType.

DBMS_XMLGEN

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLGEN"

PL/SQL API for transformation of SQL query results into canonical XML format.

DBMS_XMLINDEX

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLINDEX

PL/SQL API for XMLIndex.

DBMS_XMLPARSER

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLPARSER"

PL/SQL implementation of the DOM Parser API for XMLType.

 
 

DBMS_XMLSCHEMA

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLSCHEMA

PL/SQL API for managing XML schemas within Oracle Database – schema registration, deletion.

DBMS_XMLSTORE

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XMLSTORE"

PL/SQL API for storing XML data in relational tables.

DBMS_XSLPROCESSOR

Oracle Database PL/SQL Packages and Types Reference, Chapter "DBMS_XSLPROCESSOR"

PL/SQL implementation of an XSLT processor.


Catalog Views Related to XML

Table 1-2 lists the catalog views related to XML. Information about a given view can be obtained by using SQL command DESCRIBE.

DESCRIBE USER_XML_SCHEMAS

Table 1-2 Catalog Views Related to XML

Schema Description
USER_XML_SCHEMAS

Registered XML schemas owned by the current user

ALL_XML_SCHEMAS

Registered XML schemas usable by the current user

DBA_XML_SCHEMAS

Registered XML schemas in Oracle XML DB

USER_XML_TABLES

XMLType tables owned by the current user

ALL_XML_TABLES

XMLType tables usable by the current user

DBA_XML_TABLES

XMLType tables in Oracle XML DB

USER_XML_TAB_COLS

XMLType table columns owned by the current user

ALL_XML_TAB_COLS

XMLType table columns usable by the current user

DBA_XML_TAB_COLS

XMLType table columns in Oracle XML DB

USER_XML_VIEWS

XMLType views owned by the current user

ALL_XML_VIEWS

XMLType views usable by the current user

DBA_XML_VIEWS

XMLType views in Oracle XML DB

USER_XML_VIEW_COLS

XMLType view columns owned by the current user

ALL_XML_VIEW_COLS

XMLType view columns usable by the current user

DBA_XML_VIEW_COLS

XMLType view columns in Oracle XML DB


In addition to the views ALL_XML_TABLES, DBA_XML_TABLES, and USER_XML_TABLES, views ALL_OBJECT_TABLES, DBA_OBJECT_TABLES, and USER_OBJECT_TABLES provide tablespace and other storage information for XMLType data stored object-relationally.

Overview of Oracle XML DB Repository

Oracle XML DB Repository is a component of Oracle Database that lets you handle XML data using a file/folder/URL metaphor. The repository contains resources, which can be either folders (directories, containers) or files.

A resource has these properties:

  • It is identified by a path and name.

  • It has content (data), which can be XML data but need not be.

  • It has a set of system-defined metadata (properties), such as Owner and CreationDate, in addition to its content. Oracle XML DB uses this information to manage the resource.

  • It might also have user-defined metadata. Like system-defined metadata, this is information that is not part of the content, but is associated with it.

  • It has an associated access control list that determines who can access the resource, and for what operations.

Although Oracle XML DB Repository treats XML content specially, you can use the repository to store other kinds of data besides XML. You can use the repository to access any data that is stored in Oracle Database.

You can access data in the repository in the following ways (see Figure 1-1):

  • SQL – Using views RESOURCE_VIEW and PATH_VIEW

  • PL/SQL – Using package DBML_XDB

  • Java – Using the Oracle XML DB resource API for Java

See Also:

XML Services

Besides providing APIs for accessing and manipulating repository data, Oracle XML DB provides APIs for the following repository services:

  • Versioning – Oracle XML DB uses PL/SQL package DBMS_XDB_VERSION to version resources in Oracle XML DB Repository. Updating a resource creates a new version. Previous versions are retained. Versioning support is based on the IETF WebDAV standard.

  • ACL Security – Repository security is based on access control lists (ACLs). Each resource has an associated ACL that lists the privileges required to use it in various ways. When a resource is accessed or manipulated, its ACL determines whether the requested operation is allowed. An ACL is an XML document that contains a set of access control entries (ACEs). Each ACE grants or revokes a set of permissions to a particular user or group (database role). This access control mechanism is based on the WebDAV specification.

  • Foldering – Oracle XML DB Repository manages a persistent hierarchy of folder (that is, directory) resources that contain other resources (files or folders). Oracle XML DB modules such as protocol servers, the XML schema manager, and the Oracle XML DB RESOURCE_VIEW API use foldering to map repository path names to the resources they target.

Views RESOURCE_VIEW and PATH_VIEW

Views RESOURCE_VIEW and PATH_VIEW provide SQL access to data in Oracle XML DB Repository through protocols such as FTP and WebDAV. View PATH_VIEW has one row for each unique path in the repository. View RESOURCE_VIEW has one row for each resource in the repository.

The Oracle XML DB resource API for PL/SQL, DBMS_XDB, provides query and DML functions. It is based on RESOURCE_VIEW and PATH_VIEW.

See Also:

Oracle XML DB Repository Architecture

Figure 1-3 illustrates the architecture of Oracle XML DB Repository.

Figure 1-3 Oracle XML DB Repository Architecture

Description of Figure 1-3 follows
Description of "Figure 1-3 Oracle XML DB Repository Architecture"

Files and Folders

Relational databases are traditionally poor at managing hierarchical structures and traversing a path or a URL. Oracle XML DB Repository provides you with a hierarchical organization of XML content in the database. You can query and manage it as if it were organized using files and folders.

The relational table-row-column metaphor is an effective model for managing highly structured data. It can be less effective for managing semi-structured and unstructured data, such as document-oriented XML data.

For example, a book is not easily represented as a set of rows in a table. It might be more natural to represent a book as a hierarchy, book — chapter — section — paragraph, and to represent the hierarchy as a set of folders and subfolders.

  • A hierarchical repository index speeds up folder and path traversals. Oracle XML DB includes a patented hierarchical index that speeds up folder and path traversals in Oracle XML DB Repository. The hierarchical repository index is transparent to end users, and lets Oracle XML DB perform folder and path traversals at speeds comparable to or faster than conventional file systems.

  • You can access XML documents in Oracle XML DB Repository using standard connect-access protocols such as FTP, HTTP(S), and WebDAV, in addition to languages SQL, PL/SQL, Java, and C. The repository provides content authors and editors direct access to XML content stored in Oracle Database.

  • A resource in this context is a file or folder, identified by a URL. WebDAV is an IETF standard that defines a set of extensions to the HTTP protocol. It lets an HTTP server act as a file server for a DAV-enabled client. For example, a WebDAV-enabled editor can interact with an HTTP/WebDAV server as if it were a file system. The WebDAV standard uses the term resource to describe a file or a folder. Each resource managed by a WebDAV server is identified by a URL. Oracle XML DB adds native support to Oracle Database for these protocols. The protocols were designed for document-centric operations. By providing support for these protocols, Oracle XML DB lets Microsoft Windows Explorer, Microsoft Office, and products from vendors such as Altova and Adobe work directly with XML content stored in Oracle XML DB Repository. Figure 1-4 shows the root-level directory of the repository as seen from a Web browser.

Figure 1-4 Web Browser View of Oracle XML DB Repository

Description of Figure 1-4 follows
Description of "Figure 1-4 Web Browser View of Oracle XML DB Repository"

Hence, WebDAV clients such as Microsoft Windows Explorer can connect directly to Oracle XML DB Repository. No additional Oracle Database or Microsoft-specific software or other complex middleware is needed. End users can work directly with Oracle XML DB Repository using familiar tools and interfaces.

Oracle XML DB Protocol Architecture

One key feature of the Oracle XML DB architecture is that protocols HTTP(S), WebDAV, and FTP are supported, including in a shared server configuration. When the Listener receives an HTTP(S) or FTP request, it hands it off to an Oracle Database shared server process which services it and sends the appropriate response back to the client.You can use the TNS Listener command, lsnrctl status, to verify that HTTP(S) and FTP support has been enabled. Example 1-1 illustrates this.

Example 1-1 Listener Status with FTP and HTTP(S) Protocol Support Enabled

LSNRCTL for 32-bit Windows: Version 11.1.0.5.0 - Production on 20-AUG-2007 16:02:34
 
Copyright (c) 1991, 2007, Oracle.  All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521))) STATUS of the LISTENER
--------------------------------------------------------------------------------------------
Alias                          LISTENER
Version                        TNSLSNR for 32-bit Windows: Version 11.1.0.5.0 - Beta
Start Date                     20-JUN-2007 15:35:40
Uptime                         0 days 16 hr. 47 min. 42 sec
Trace Level                    off
Security                       ON: Local OS Authentication
SNMP                           OFF
Listener Parameter File        C:\oracle\product\11.1.0\db_1\network\admin\listener.ora
Listener Log File              c:\oracle\diag\tnslsnr\quine-pc\listener\alert\log.xml
 
Listening Endpoints Summary... 
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\EXTPROC1521ipc)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=quine-pc.example.com)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=quine-pc.example.com)
             (PORT=21))(Presentation=FTP)(Session=RAW)) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=quine-pc.example.com)
             (PORT=443))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "orcl.example.com" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orclXDB.example.com" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orcl_XPT.example.com" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
The command completed successfully

Programmatic Access to Oracle XML DB (Java, PL/SQL, and C)

All Oracle XML DB functionality is accessible from C, PL/SQL, and Java. You can build Web-based applications in various ways, including these:

  • Using servlets and Java Server Pages (JSP). A typical API accesses data using Java Database Connectivity (JDBC).

  • Using Extensible Stylesheet Language (XSL) plus XML Server Pages (XSP). A typical API accesses data in the form of XML documents that are processed using a Document Object Model (DOM) API implementation.

Oracle XML DB supports such styles of application development. It provides Java, PL/SQL, and C implementations of the DOM API.Applications that use JDBC, such as those based on servlets, need prior knowledge of the data structure they are processing. Oracle JDBC drivers allow you to access and update XMLType tables and columns, and call PL/SQL procedures that access Oracle XML DB Repository.Applications that use DOM, such as those based on XSLT transformations, typically require less knowledge of the data structure. DOM-based applications use string names to identify pieces of content, and must dynamically walk through the DOM tree to find the required information. For this, Oracle XML DB supports the use of the DOM API to access and update XMLType columns and tables. Programming to a DOM API is more flexible than programming through JDBC, but it may require more resources at run time.

Oracle XML DB Features

Any database used for managing XML must be able to persist XML documents. Oracle XML DB is capable of much more than this. It provides standard database features such as transaction control, data integrity, replication, reliability, availability, security, and scalability, while also allowing for efficient indexing, querying, updating, and searching of XML documents in an XML-centric manner.

The hierarchical nature of XML presents the traditional relational database with some challenges:

  • In a relational database, the table-row metaphor locates content. Primary-Key Foreign-Key relationships help define the relationships between content. Content is accessed and updated using the table-row-column metaphor.

  • XML, on the other hand, uses hierarchical techniques to achieve the same functionality. A URL is used to locate an XML document. URL-based standards such as XLink are used to define relationships between XML documents. W3C Recommendations such as XPath are used to access and update content contained within XML documents. Both URLs and XPath expressions are based on hierarchical metaphors. A URL uses a path through a folder hierarchy to identify a document, whereas XPath uses a path through the node hierarchy of an XML document to access part of an XML document.

Oracle XML DB addresses these challenges by introducing SQL functions and methods that allow the use of XML-centric metaphors, such as XQuery and XPath expressions for querying and updating XML Documents.

These are the major features of Oracle XML DB:

XMLType Data Type

XMLType is an abstract native SQL data type for XML data. It provides methods that allow operations such as XML Schema validation and XSL transformation of XML content. You can use XMLType as you would any other SQL data type. For example, you can use XMLType when you do any of the following:

  • Create a column in a relational table

  • Declare a PL/SQL variable

  • Define or call a PL/SQL procedure or function

XMLType is an Oracle Database object type, so you can also create a table of XMLType object instances. By default, an XMLType table or column can contain any well-formed XML document.

See Also:

Oracle Database Object-Relational Developer's Guide for information about Oracle Database object types and object-relational storage

XMLType Tables and Columns Can Conform to an XML Schema

XMLType tables or columns can be constrained to conform to an XML schema. This has several advantages:

  • The database ensures that only XML documents that validate against the XML schema are stored in the column or table. Invalid documents are excluded.

  • Because XML schema-based data conforms to a predefined XML structure, Oracle XML DB can use the information contained in the XML schema to optimize querying and updating of the data.

  • If you store XML schema-based data using structured storage, Oracle XML DB automatically decomposes it and stores it as a set of object-relational objects. The object-relational model used to store the document is derived from the XML schema.

XMLType API

Data type XMLType provides the following:

  • Constructors, which you can use to create an XMLType instance from a VARCHAR, CLOB, BLOB, or BFILE value.

  • XML-specific methods that operate on XMLType instances. These include the following:

    • extract()– Extract a subset of nodes contained in the XMLType instance.

    • existsNode() – Check whether or not a particular node exists in the XMLType instance.

    • schemaValidate() – Validate the content of the XMLType instance against an XML schema.

    • transform() – Perform an XSL transformation on the content of an XMLType instance.

XML Schema Support

Support for the Worldwide Web Consortium (W3C) XML Schema Recommendation is a key feature in Oracle XML DB. XML Schema specifies the structure, content, and certain semantics of XML documents. It is described in detail at http://www.w3.org/TR/xmlschema-0/.

The W3C Schema Working Group publishes a particular XML schema, often referred to as the schema for schemas, that provides the definition, or vocabulary, of the XML Schema language. An XML schema definition (XSDFoot 1 ), also called an XML schema, is an XML document that is compliant with the vocabulary defined by the schema for schemas.

An XML schema uses vocabulary defined by the schema for schemas to create a collection of XML Schema type definitions and element declarations that comprise a vocabulary for describing the contents and structure of a new class of XML documents, the XML instance documents that conform to that XML schema.

Note:

This manual uses the term "XML schema" (lower-case "s") to reference any XML schema that conforms to the W3C XML Schema (upper-case "S") Recommendation. Since an XML schema is used to define a class of XML documents, the term "instance document" is often used to describe an XML document that conforms to a particular XML schema.

The XML Schema language provides strong typing of elements and attributes. It defines numerous scalar data types. This base set of data types can be extended to define more complex types, using object-oriented techniques such as inheritance and extension. The XML Schema vocabulary also includes constructs that you can use to define complex types, substitution groups, repeating sets, nesting, ordering, and so on. Oracle XML DB supports all of the constructs defined by the XML Schema Recommendation, except for redefines.

XML schemas are commonly used as a mechanism for checking (validating) whether XML instance documents conform with their specifications. Oracle XML DB includes XMLType methods and SQL functions that you can use to validate XML documents against an XML schema.

In Oracle XML DB, you can use a standard data model for all of your data, regardless of how structured it is. You can use XML Schema to automatically create database tables for storing your XML data. XML schema-based data maintains DOM fidelity and allows for significant database optimizations.

XML schema-based data can be stored using any Oracle XML DB XMLType storage model: binary XML storage, structured (object-relational) storage, or unstructured (CLOB) storage. Non-schema-based XML data can be stored using binary XML storage or unstructured storage.

You can also wrap existing relational and object-relational data as XMLType views, which can optionally be XML schema-based. You can map from incoming XML documents to XMLType storage, specifying the mapping using a registered XML schema.

See Also:

Chapter 7, "XML Schema Storage and Query: Basic" for more information about using XML schemas with Oracle XML DB

XMLType Storage Models

XMLType is an abstract data type that provides different storage models to best fit your data and your use of it. As an abstract data type, your applications and database queries gain in flexibility: the same interface is available for all XMLType operations. Because different storage (persistence) models are available, you can tailor performance and functionality to best fit the kind of XML data you have and the pattern of its use. One key decision to make when using Oracle XML DB for persisting XML data as XMLType is thus which storage model to use for which XML data.

You can change XMLType storage from one model to another, using database import/export (see Chapter 36, "Exporting and Importing XMLType Tables"). Your application code need not change. You can change XML storage options when tuning your application.

XMLType tables and columns can be stored in these ways:

  • Structured storage – XMLType data is stored as a set of objects. This is also referred to as object-relational storage and object-based persistence.

  • Unstructured storage – XMLType data is stored in Character Large Object (CLOB) instances. This is also referred to as CLOB storage and text-based persistence.

  • Binary XML storage – XMLType data is stored in a post-parse, binary format specifically designed for XML data. Binary XML is compact, post-parse, XML schema-aware XML. This is also referred to as post-parse persistence.

Oracle Database provides two LOB storage options, SecureFile and BasicFile. Either of these can be used with unstructured (CLOB-based) XMLType storage. BasicFile LOB storage is the default for unstructured storage.

For binary XML data, SecureFile is the default storage option.Foot 2  However, if either of the following is true then it is not possible to use SecureFile LOB storage. In that case, BasicFile is the default option for binary XML data:

  • The tablespace for the XMLType table does not use automatic segment space management.

  • A setting in file init.ora prevents SecureFile LOB storage. For example, see parameter DB_SECUREFILE.

See Also:

You can mix storage models, using one model for one part of an XML document and a different model for another part. The mixture of structured and unstructured storage is sometimes called hybrid storage. What is true about structured storage is true about the structured part of hybrid storage. What is true about unstructured storage is true about the unstructured part of hybrid storage.

XMLType has multiple storage models, and some models can be configured in more than one way. Each model has its advantages, depending on the context. Each model has one or more types of index that are appropriate for it.

The first thing to consider, when choosing an XMLType storage model, is the nature of your XML data and the ways you use it. A spectrum exists, with data-centric use of highly structured data at one end, and document-centric use of highly unstructured data at the other. The first question to ask yourself is this: Is your use case primarily data-centric or document-centric?

  • Data-centric – Your data is, in general, highly structured, with relatively static and predictable structure, and your applications take advantage of this structure. Your data conforms to an XML schema.

  • Document-centric – Two cases:

    • Your data is generally without structure or of variable structure. Document structure can vary over time (evolution). Content is mixed (semi-structured): many elements contain both text nodes and child elements. Many XML elements can be absent or can appear in different orders. Documents might or might not conform to an XML schema.

    • Your data relatively structured, but your applications do not take advantage of that structure: they treat the data as if it were without structure.

Note:

Please be aware of the context, so as not to confuse discussion of storage options with discussion of the structure of the XML content to be stored. In this book, "structured" and "unstructured" generally refer to XMLType storage options. They refer less often to the nature of your data. "Hybrid" refers to object-relational storage with some embedded CLOB storage. "Semi-structured" refers to XML content, regardless of storage. Unstructured storage is CLOB-based storage, and structured storage is object-relational storage.

Once you've located the data-centric or document-centric half of the spectrum that is appropriate for your use case and data, consider whether your case is at an end of the spectrum or closer to the middle. That is, just how data-centric or document-centric is your case?

  • Employ object-relational (structured) storage for purely data-centric uses. A typical example of this use case would be an employee record (fields employee number, name, address, and so on). Use B-tree indexing with object-relational storage.

  • Employ hybrid storage if your data is composed primarily of invariable XML structures, but it does contain some variable data; that is, it contains a predictably few mixed-content elements. A typical example of this use case would be an employee record that includes a free-form resume. Index the structured and unstructured parts of your data separately, using appropriate indexes for each part.

  • Employ binary XML storage or CLOB-based (unstructured) storage for all document-centric use cases. XMLIndex is the indexing method of choice here.

    • For general indexing of document-centric XML data, use XMLIndex indexes with unstructured components. A typical example of this use case would be an XML Web document or a book chapter.

    • If your data contains some predictable, fixed structures that you query frequently, then you can use XMLIndex indexes with structured components on those parts. A typical example of this use case would be a free-form specification, with author, date, and title fields.

    A single XMLIndex index can have both structured and unstructured components, to handle islands of structure within generally unstructured content. A use case where you might use both components would be to support queries that extract an XML fragment from a document whenever some structured data is present. The unstructured index component is used for the fragment extraction. The structured component is used for the SQL WHERE clause that checks for the structured data.

In all cases, you can additionally use Oracle Text indexing for full-text queries. This is especially useful for document-centric cases.

These considerations are summarized in Figure 1-5. The figure shows the spectrum of use cases, from most data-centric, at the left, to most document-centric, at the right. The table in the figure classifies use cases and shows the corresponding storage models and indexing methods.

Figure 1-5 XML Use Cases and XMLType Storage Models

Description of Figure 1-5 follows
Description of "Figure 1-5 XML Use Cases and XMLType Storage Models"

See Chapter 6, "Indexing XMLType Data" for more information about indexing XML data. In particular, note that some types of indexing are complementary or orthogonal, so you can use them together.

The following list and Table 1-3 outline some of the advantages of each storage model.

  • Structured (object-relational) storage advantages over the other storage models include near-relational query and update performance, optimized memory management, reduced storage requirements, B-tree indexing, and in-place updates. These advantages are at a cost of increased processing overhead during ingestion and full retrieval of XML data, and reduced flexibility in the structure of the XML that can be managed by a given XMLType table or column. Structural flexibility is reduced because data and metadata (such as column names) are separated in object-relational storage. Instance structures cannot vary easily. Structured storage is particularly appropriate for highly structured data whose structure does not vary, if this maps to a manageable number of database tables and joins.

  • Unstructured (CLOB) storage enables higher throughput than structured storage when inserting and retrieving entire XML documents. No data conversion is needed, so the same format can be used outside the database. Unstructured storage also provides greater flexibility than structured storage in the structure of the XML that can be stored. Unstructured storage is particularly appropriate for document-centric use cases. These advantages can come at the expense of certain aspects of intelligent processing: in the absence of indexing, there is little that the database can do to optimize queries or updates on XML data that is stored in a CLOB instance. In particular, the cost of XML parsing (often implicit) can significantly impact query performance. Indexing with XMLIndex can improve the performance of queries against unstructured storage.

  • Binary XML storage provides more efficient database storage, updating, indexing, and fragment extraction than unstructured storage. It can provide better query performance than unstructured storage — it does not suffer from the XML parsing bottleneck (it is a post-parse persistence model). Like structured storage, binary XML storage is aware of XML Schema data types and can take advantage of native database data types. Like structured storage, binary XML storage allows for piecewise updates. Because binary XML data can also be used outside the database, it can serve as an efficient XML exchange medium, and you can off load work from the database to increase overall performance in many cases. Like unstructured storage, binary XML data is kept in document order. Like structured storage, data and metadata can, using binary storage, be separated at the database level, for efficiency. Like unstructured storage, however, binary XML storage allows for intermixed data and metadata, which lets instance structures vary. Binary XML storage allows for very complex and variable data, which in the structured-storage model could necessitate using many database tables. Unlike the other XMLType storage models, you can use binary storage for XML schema-based data even if the XML schema is not known beforehand, and you can store multiple XML schemas in the same table and query across common elements.

Table 1-3 XMLType Storage Models: Relative Advantages

Quality Structured (Object-Relational) Storage Binary XML Storage Unstructured (CLOB) Storage

Throughput

XML decomposition can result in reduced throughput when ingesting or retrieving the entire content of an XML document.

+ High throughput. Fast DOM loading. There is a slight overhead from the binary XML encoder/decoder.

++ High throughput when ingesting and retrieving the entire content of an XML document.

Queries

++ Extremely fast: relational query performance.

You can create B-tree indexes on the underlying object-relational columns.

+ Streaming XPath evaluation avoids DOM construction and allows evaluation of multiple XPath expressions in a single pass. Navigational XPath evaluation is significantly faster than with unstructured storage.

XMLIndex indexing can improve performance of XPath-based queries.

XPath operations are evaluated by constructing a DOM from the CLOB data and using functional evaluation. Expensive when performing operations on large documents or large collections of documents.

XMLIndex indexing can improve performance of XPath-based queries.

Update operations (DML)

++ Extremely fast: relational columns are updated in place.

+ In-place, piecewise update for SecureFile LOB storage.

When any part of the document is updated, the entire document must be written back to disk.

Space efficiency (disk)

++ Extremely space-efficient.

+ Space-efficient.

Consumes the most disk space, due to insignificant whitespace and repeated tags.

Data flexibility

Limited flexibility. Only documents that conform to the XML schema can be stored in the XMLType table or column.

+ Flexibility in the structure of the XML documents that can be stored in an XMLType column or table.

+ Flexibility in the structure of the XML documents that can be stored in an XMLType column or table.

XML schema flexibility

An XMLType table can only store documents that conform to the same XML schema.

In-place XML schema evolution is available, with some restrictions.

++ Can store XML schema-based or non-schema-based documents. An XMLType table can store documents that conform to any registered XML schemas.

++ Can store XML schema-based or non-schema-based documents. Cannot use multiple XML schemas for the same XMLType table.

XML fidelity

DOM fidelity: A DOM created from an XML document that has been stored in the database is identical to a DOM created from the original document. However, insignificant whitespace may be discarded.

DOM fidelity (see structured storage description).

+ Document fidelity: Maintains the original XML data, byte for byte. In particular, all original whitespace is preserved.

Indexing support

+ B-tree, bitmap, Oracle Text, XMLIndex, and function-based indexes.

XMLIndex, function-based, and Oracle Text indexes.

XMLIndex, function-based, and Oracle Text indexes.

Optimized memory management

+ XML operations can be optimized to reduce memory requirements.

+ XML operations can be optimized to reduce memory requirements.

XML operations on the document require creating a DOM from the document.

Validation upon insert

XML data is partially validated when it is inserted.

+ XML schema-based data is fully validated when it is inserted.

XML schema-based data is partially validated when it is inserted.

Partitioning

+ Available.Foot 1 

Partition based on virtual columns.

XMLType columns can be partitioned if the partitioning key is a relational column.

Streams-based replication

Not available.

Not available.

++ Available.

Compression

++ XML elements and attributes can be compressed individually.

+ XML data that uses SecureFile LOB storage can be compressed.

Not available.


Footnote 1 Partitioning of ordered collection tables (OCTs) reflects the partitioning of the top-level XMLType tables. Partition maintenance operations on the top-level tables are cascaded to the associated OCTs. See "Partitioning XMLType Tables and Columns Stored Object-Relationally".

Note:

When you insert XML schema-based data into binary XMLType columns or tables, the data is fully validated against the XML schema. Insertion fails if the data is invalid.

When XMLType is stored object-relationally, the XMLType instances contain hidden columns that store information about the XML data that does not fit into the SQL object model.

XML/SQL Duality

A key objective of Oracle XML DB is to provide XML/SQL duality. XML programmers can leverage the power of the relational model when working with XML content and SQL programmers can leverage the flexibility of XML when working with relational content. This lets you use the most appropriate tools for a particular business problem.

XML/SQL duality means that the same data can be exposed as rows in a table and manipulated using SQL or exposed as nodes in an XML document and manipulated using techniques such as DOM and XSL transformation. Access and processing techniques are independent of the underlying storage format.

These features provide simple solutions to common business problems. For example:

  • You can use Oracle XML DB SQL functions to generate XML data directly from a SQL query. You can then transform the XML data into other formats, such as HTML, using the database-resident XSLT processor.

  • You can access XML content without converting between different data formats, using SQL queries, on-line analytical processing (OLAP), and business-intelligence/data warehousing operations.

  • You can perform text, spatial data, and multimedia operations on XML content.

SQL/XML Standard Functions

Oracle XML DB provides the SQL functions that are defined in the SQL/XML standard. SQL/XML functions fall into two groups:

  • Functions that you can use to generate XML data from the result of a SQL query. In this book, these are called SQL/XML publishing functions. They are also sometimes called SQL/XML generation functions.

  • Functions that you can use to query and access XML content as part of normal SQL operations. In this book, these are called SQL/XML query and access functions.

Using SQL/XML functions you can address XML content in any part of a SQL statement. These functions use XQuery or XPath expressions to traverse the XML structure and identify the nodes on which to operate. The ability to embed XQuery and XPath expressions in SQL statements greatly simplifies XML access.

See Also:

Automatic Rewriting of XQuery and XPath Expressions

SQL/XML functions and XMLType methods use XQuery or XPath expressions to search collections of XML documents and to access a subset of the nodes contained within an XML document. In many cases, Oracle XML DB is able to automatically rewrite such expressions to code that executes directly against the underlying database objects.

How XPath Expressions Are Evaluated by Oracle XML DB

Oracle XML DB provides the following ways of evaluating XPath expressions that operate on XMLType columns and tables, depending on the XML storage method used:

  • Structured storage – Oracle XML DB attempts to translate the XPath expression in a SQL/XML function into an equivalent SQL query. The SQL query references the object-relational data structures that underpin a schema-based XMLType. This process is referred to as XPath rewrite. It can occur when performing queries and UPDATE operations. In addition, B-tree indexes on the underlying object-relational tables can be used to evaluate XPath expressions for structured storage.

  • Unstructured storage – XMLIndex indexes can be used to evaluate XPath expressions for unstructured storage. Use XMLIndex by preference.

    • If an XMLIndex index can be used, then it is used instead of functional evaluation.

    • In the absence of such an index, Oracle XML DB evaluates the XPath expression using functional evaluation. Functional evaluation builds a DOM tree for each XML document, and then resolves the XPath programmatically using the methods provided by the DOM API. If the operation involves updating the DOM tree, the entire XML document must be written back to disk when the operation is completed.

  • Binary XML storage – Oracle XML DB can evaluate XPath expressions in different ways: using XMLIndex and using single-pass streaming. Single-pass streaming means evaluating a set of XPath expressions in a single scan of binary XML data. During query compilation, the cost-based optimizer picks the fastest combination of evaluation methods.

Rewriting SQL Code That Contains XQuery and XPath Expressions

For XML data that is stored object-relationally, Oracle XML DB can rewrite SQL statements that contain XQuery and XPath expressions to purely relational SQL statements, which are then processed in an optimal manner. In this way, Oracle XML DB insulates the database optimizer from needing to understand the XQuery and XPath languages and the XML data model. The database optimizer processes a rewritten SQL statement the same way it processes other SQL statements. The general term applied to this rewriting process is XPath rewrite.

The database optimizer can thus derive an execution plan based on conventional relational algebra. This in turn means that Oracle XML DB can leverage all of the features of the database, and ensure that SQL statements containing XQuery and XPath expressions are executed in a highly performant and efficient manner. There is little overhead with this rewriting, and Oracle XML DB executes XQuery-based and XPath-based queries at near-relational speed, while preserving the XML abstraction.

When Can XPath Rewrite Occur?

XPath rewrite is possible when all of the following conditions are met:

  • An XMLType column or table uses structured storage techniques to provide the underlying storage model.

  • An XMLType column or table is associated with a registered XML schema.

  • A SQL statement contains SQL/XML functions or XMLType methods that use XPath expressions to refer to one or more nodes within a set of XML documents.

  • The nodes referenced by an XPath expression can be mapped, using the XML schema, to attributes of the underlying SQL object model.

What is the XPath-Rewrite Process?

XPath rewrite performs the following tasks:

  1. Identify the set of XPath expressions included in the SQL statement.

  2. Translate each XPath expression into an object relational SQL expression that references the tables, types, and attributes of the underlying SQL: 1999 object model.

  3. Rewrite the original SQL statement into an equivalent object relational SQL statement.

  4. Pass the new SQL statement to the database optimizer for plan generation and query execution.

In certain cases, XPath rewrite is not possible. This normally occurs when there is no SQL equivalent of the XPath expression. In this situation, Oracle XML DB performs a functional evaluation of the XPath expressions.

In general, functional evaluation of a SQL statement is more expensive than XPath rewrite, particularly if the number of documents to be processed is large. The advantage of functional evaluation is that it is always possible, regardless of whether the XMLType data is stored using structured storage and regardless of the complexity of the XPath expression.

Understanding the concept of XPath rewrite, and the conditions under which it can take place, is a key step in developing Oracle XML DB applications that deliver the required levels of scalability and performance.

Oracle XML DB Benefits

This section describes several benefits of using Oracle XML DB. Figure 1-6 presents an overview.

Figure 1-6 Oracle XML DB Benefits

Description of Figure 1-6 follows
Description of "Figure 1-6 Oracle XML DB Benefits"

Unifying Data and Content

Most application data and Web content is stored in a relational database, a file system, or both. XML data is often used for data exchange, and it can be generated from a relational database or a file system. As the volume of XML data exchanged grows, the cost of regenerating this data grows, and these storage methods become less effective at accommodating XML content.

Figure 1-7 Unifying Data and Content: Some Common XML Architectures

Description of Figure 1-7 follows
Description of "Figure 1-7 Unifying Data and Content: Some Common XML Architectures"

Organizations often manage their structured data and unstructured data differently:

  • Unstructured data, in tables, makes document access transparent and table access complex.

  • Structured data, often in binary large objects (such as in BLOB instances), makes access more complex and table access transparent.

With Oracle XML DB, you can store and manage data that is structured, unstructured, and semi-structured using a standard data model and standard SQL and XML. You can perform SQL operations on XML documents and XML operations on object-relational (such as table) data.

Exploiting Database Capabilities

Oracle Database has the following key database capabilities for working with XML:

  • Indexing and search – Applications use queries such as "find all the product definitions created between March and April 2002", a query that is typically supported by a B-tree index on a date column. Oracle XML DB can enable efficient structured searches on XML data, saving content-management vendors the need to build proprietary query APIs to handle such queries.

  • Updates and transaction processing – Commercial relational databases use fast updates of subparts of records, with minimal contention between users trying to update. As traditionally document-centric data participate in collaborative environments through XML, this requirement becomes more important. File or CLOB storage cannot provide the granular concurrency control that Oracle XML DB does.

  • Managing relationships – Data with any structure typically has foreign-key constraints. XML data stores generally lack this feature, so you must implement any constraints in application code. Oracle XML DB enables you to constrain XML data according to XML schema definitions, and hence achieve control over relationships that structured data has always enjoyed.

  • Multiple views of data – Most enterprise applications need to group data together in different ways for different modules. This is why relational views are necessary—to allow for these multiple ways to combine data. By allowing views on XML, Oracle XML DB creates different logical abstractions on XML for, say, consumption by different types of applications.

  • Performance and scalability – Users expect data storage, retrieval, and query to be fast. Loading a file or CLOB value, and parsing, are typically slower than relational data access. Oracle XML DB dramatically speeds up XML storage and retrieval.

  • Ease of development – Databases are foremost an application platform that provides standard, easy ways to manipulate, transform, and modify individual data elements. While typical XML parsers give standard read access to XML data they do not provide an easy way to modify and store individual XML elements. Oracle XML DB supports several standard ways to store, modify, and retrieve data. These include XML Schema, XQuery, XPath, DOM, and Java.

Exploiting XML Capabilities

If the drawbacks of XML file storage force you to break down XML into database tables and columns, there are several XML advantages you have left:

  • Structure independence: The open content model of XML cannot be captured easily in the pure tables-and-columns world. XML schemas allow global element declarations, not just scoped to a container. Hence you can find a particular data item regardless of where in the XML document it moves to as your application evolves.

  • Storage independence: When you use relational design, your client programs must know where your data is stored, in what format, what table, and what the relationships are among those tables. XMLType enables you to write applications without that knowledge and lets database administrators map structured data to physical table and column storage.

  • Ease of presentation: XML is understood natively by Web browsers, many popular desktop applications, and most Internet applications. Relational data is generally not accessible directly from applications. Additional programming is required to make relational data accessible to standard clients. Oracle XML DB stores data as XML and makes it available as XML outside the database. No extra programming is required to display database content.

  • Ease of interchange – XML is the language of choice in business-to-business (B2B) data exchange. If you are forced to store XML in an arbitrary table structure, you are using some kind of proprietary translation. Whenever you translate a language, information is lost and interchange suffers. By natively understanding XML and providing DOM fidelity in the storage/retrieval process, Oracle XML DB enables a clean interchange.

Efficient Storage and Retrieval of Complex XML Documents

Users today face a performance barrier when storing and retrieving complex, large, or many XML documents. Oracle XML DB provides high performance and scalability for XML operations. The major performance features are:

Use XMLType Views If Your Data Is Not XML

XMLType views provide a way for you wrap existing relational and object-relational data in XML format. This is especially useful if, for example, your legacy data is not in XML format but you must migrate it to XML format. Using XMLType views, you need not alter your application code.

To use XMLType views, you must first register an XML schema with annotations that represent a bidirectional mapping between XML Schema data types and either SQL data types or binary XML encoding types. You can then create an XMLType view conforming to this mapping, by providing an underlying query that constructs instances of the appropriate types.

Search XML Data using Oracle Text

Oracle Database enables special indexing on XML data, including Oracle Text indexes for section searching, SQL functions to process XML data, aggregation of XML data, and special optimization of queries involving XML data. Oracle SQL functions hasPath and inPath are designed to optimize XML data searches where you can search within XML text for substring matches.

Build Messaging Applications using Oracle Streams Advanced Queuing

Oracle Streams Advanced Queuing supports the use of:

Standards Supported by Oracle XML DB

Oracle XML DB supports all major XML, SQL, Java, and Internet standards:

  • W3C XML Schema 1.0 Recommendation. You can register XML schemas, validate stored XML content against XML schemas, or constrain XML stored in the server to XML schemas.

  • W3C XQuery 1.0 Recommendation and W3C XPath 2.0 Recommendation. You can search or traverse XML stored inside the database using XQuery and XPath, either from HTTP(S) requests or from SQL.

  • SQL/XML.

  • Java Database Connectivity (JDBC) API. JDBC access to XML is available for Java programmers.

  • W3C XSL 1.0 Recommendation. You can transform XML documents at the server using XSLT.

  • W3C DOM Recommendation Levels 1.0 and 2.0 Core. You can retrieve XML stored in the server as an XML DOM, for dynamic access.

  • Protocol support. You can store or retrieve XML data from Oracle XML DB using Oracle Net or standard protocols such as HTTP(S), FTP, and IETF WebDAV.

  • Java Servlet version 2.2, (except: the servlet WAR file, web.xml, is not supported in its entirety; only one ServletContext; one web-app are currently supported; and stateful servlets are not supported).

  • Web services: SOAP 1.1. You can access XML stored in the server from SOAP requests. You can build, publish, or find Web Services using Oracle XML DB and Oracle9iAS, using WSDL and UDDI. You can use Oracle Streams Advanced Queuing IDAP, the SOAP specification for queuing operations, on XML stored in Oracle Database.

  • W3C XML Linking Language (Xlink) 1.0 Recommendation. You can define various types of links between XML documents.

  • W3C XML Pointer Language (XPointer) Recommendation and XPointer Framework. You can include the content of multiple XML documents or fragments in a single infoset.

See Also:

Oracle XML DB Technical Support

Besides your regular channels of support through your customer representative or consultant, technical support for Oracle Database XML-enabled technologies is available free through the Discussion Forums section of Oracle Technology Network (OTN):

http://forums.oracle.com/forums/category.jspa?categoryID=51

Oracle XML DB Examples Used in This Manual

This manual contains examples that illustrate the use of Oracle XML DB and XMLType. The examples are based on various database schemas, sample XML documents, and sample XML schemas.

Further Oracle XML DB Case Studies and Demonstrations

Visit OTN to view Oracle XML DB examples, white papers, case studies, and demonstrations:

http://www.oracle.com/technetwork/database-features/xmldb/overview/index.html

Comprehensive XML classes on how to use Oracle XML DB are also available. See the Oracle University link on OTN.

Several detailed Oracle XML DB case studies are available on OTN and include the following:

  • Oracle XML DB Downloadable Demonstration. This detailed demonstration illustrates how to use many Oracle XML DB features. Parts of this demonstration are also included in Chapter 3, "Using Oracle XML DB".

  • SAX Loader Application. This demonstrates an efficient way to break up large files containing multiple XML documents outside the database and insert them into the database as a set of separate documents. This is provided as a standalone and a Web-based application.

  • Oracle XML DB Utilities package. This highlights the subprograms provided with the XDB_Utilities package. These subprograms operate on BFILE values, CLOB values, DOM, and Oracle XML DB Resource APIs. With this package, you can perform basic Oracle XML DB foldering operations, read and load XML files into a database, and perform basic DOM operations through PL/SQL.

  • Card Payment Gateway Application. This application uses Oracle XML DB to store all your data in XML format and enables access to the resulting XML data using SQL. It illustrates how a credit card company can store its account and transaction data in the database and also maintain XML fidelity.

  • Survey Application. This application determines what members want from Oracle products. OTN posts the online surveys and studies the responses. This Oracle XML DB application demonstrates how a company can create dynamic, interactive HTML forms, deploy them to the Internet, store the responses as XML, and analyze them using the XML enabled Oracle Database.



Footnote Legend

Footnote 1: xsd is the prefix used in the schema of schemas for the XML Schema namespace, hence it is also the namespace prefix used for the XML Schema data types, such as xsd:string. xsd is also used often as the file extension of XML schema files.
Footnote 2: Prior to Oracle Database 11g Release 2 (11.2.0.2) the BasicFile option was the default for binary XML storage. Use of the BasicFile option for binary XML data is deprecated.