3 Identifying Data Sources and Importing Metadata

This chapter describes how to use Oracle Warehouse Builder to import metadata.

This chapter contains the following topics:

Overview of Data Sources

In general, the source systems for a data warehouse are typically transaction processing applications. A sales analysis data warehouse, for instance, extracts data from an order entry system that records current order activities.Designing the data extraction process can be problematic. If the source system is complex and poorly documented, then determining which data to extract can be difficult. Moreover, it is usually not possible to modify the source system, nor adjust its performance or availability. To address these problems, first import the metadata.

Metadata is the data that describes the contents of a given object in a data set. The metadata for a table, for instance, indicates the data type for each column.

For Oracle Database customers, the recommended tool of choice for importing metadata is Oracle Warehouse Builder (OWB). After you import the metadata into Warehouse Builder, you can annotate the metadata and design an extraction strategy independently from the transaction processing application.

General Steps for Importing Metadata from Sources

To import metadata:

  1. Complete the instructions for "Accessing Oracle Warehouse Builder".

  2. Download and install the Oracle Warehouse Builder demonstration as described in "Installing the Oracle Warehouse Builder Demonstration".

  3. Identify the Warehouse Builder project.

    See "About Workspaces, Projects, and Other Devices in Warehouse Builder".

  4. Follow along with the "Example: Importing Metadata from Flat Files".

    The example explains how to specify where the source files are located and how to start the Metadata Import Wizard. The process for importing data objects such as files, tables, and views is the same. Therefore, after you complete this example, you will have a general understanding of how to import all data objects into Warehouse Builder.

About Workspaces, Projects, and Other Devices in Warehouse Builder

After you install the Warehouse Builder demonstration and start the Design Center, you log in to a workspace. The user name and workspace name are displayed along the top of the Design Center.

Recall that a workspace includes a set of users working on related projects. Security is an important consideration for determining how many workspaces to create. A common model is to create separate workspaces for development, testing, and production. Using this model, you can allow users such as your developers access to the development and testing workspaces but restrict them from the production workspace.

You can optionally divide a workspace into projects. In practice, however, workspaces typically contain only one active project. This is because a project is simply a container and not suitable for implementing security or establishing subject-oriented groupings. Security is implemented through workspaces. Establishing subject-oriented groupings can be accomplished through modules, as discussed later.

A project contains the sets of metadata related to an initiative. For data warehousing, therefore, include all the metadata for sources and targets in the same project. Also include all the functions, procedures, transformations, mappings, and other objects required to implement your initiative. The project contains nodes for each type of object that you can either create or import into Warehouse Builder. Expand the different nodes to gain a general understanding of the types of objects you can create or import.

In the demonstration, the Projects Navigator is displayed on the left side and includes two projects. MY_PROJECT is a default, pre-seeded project. You can use MY_PROJECT as your single active project in the workspace. For the purposes of the demonstration, the OWB_DEMO project is used.

Example: Importing Metadata from Flat Files

This example describes how to import metadata from flat files. Specifically, our objective is to import the metadata into the OWB_DEMO project such that the two files, export.csv and expense_categories.csv, display in the Projects Navigator under the Files node.

To import metadata from flat files:

  1. Indicate where the flat files are located as described in "Specifying Locations for the Flat Files".

  2. Organize OWB_DEMO to receive the incoming flat file metadata as described in "Creating Modules in the Project".

  3. Indicate which files to import as described in "Starting the Import Metadata Wizard".

  4. Specify the metadata structure as described in "Using the Flat File Sample Wizard".

  5. Import the metadata for both flat files as described in "Importing the Flat File Data".

Specifying Locations for the Flat Files

Indicate where the flat files are located.

In the Design Center, on the left side is a tab called Locations that contains a node called Locations. Use the Locations node to indicate where your source data resides.

Expand the Locations node and the nodes within it to gain a general understanding of the types of source and targets you can access from Warehouse Builder

For this example, right-click the Files node and select New File System Location to define a location for the flat files.

Follow the prompts in the Create File System Location dialog box. Each location you define corresponds to a specific directory on your computer file system. Therefore, consider naming the location based on the drive and directory. For the purposes of this demonstration, name the location C_NEWOWBDEMO_SOURCEFILES.

Creating Modules in the Project

In the Projects Navigator, organize OWB_DEMO to receive the incoming flat file metadata.

In a data warehousing implementation, you are likely to have numerous source and target objects. As a means of organizing these various objects, Warehouse Builder requires you to create modules. Modules enable you to establish subject-oriented groupings. Furthermore, each module corresponds to a location that you create in the Locations Navigator.

In this example, you create a module to contain company sales data. Because you have only one location for the two flat files, you create one module in the Projects Navigator. Right-click the Files node under OWB_DEMO and select New Flat File Module. Name the new module SALES_EXPENSES. For its location, specify the location you defined in the previous step, C_NEWOWBDEMO_SOURCEFILES.

Starting the Import Metadata Wizard

Start the Import Metadata Wizard.

Right-click the module SALES_EXPENSES, select New, and follow the prompts in the Import Metadata Wizard. The prompts in the wizard vary according to the type of module you selected and therefore the type of data object you are importing.

In this example, you selected to import two flat files. On the summary page of the Import Metadata Wizard, select one of the files and then select Sample to launch the Flat File Sample Wizard.

In the next steps, you sample each file in turn and then select Finish on this page to import the metadata.

Figure 3-1 Import Metadata Wizard

Description of Figure 3-1 follows
Description of "Figure 3-1 Import Metadata Wizard"

Using the Flat File Sample Wizard

Follow the prompts in the Flat File Sample Wizard to specify the metadata structure.

Based on the number of characters you specify to be sampled, the wizard reads the flat file data and provides you with suggestions as to the structure of the metadata. If the sample size is too small, the wizard may misinterpret the data and make invalid suggestions. Accordingly, you can modify and refine the settings in the wizard.

For the purposes of this example, the wizard correctly determines that the file is delimited, includes a single record type, and the character set is WE8MSWIN1252. Accept all the default settings presented in the Flat File Wizard.

To become familiar with the various types of files the wizard can sample, notice the options on the wizard pages and also select Help for additional insights.

After sampling the first flat file, return to the Summary and Import page of Metadata Import Wizard to sample the second file.

Accept the default setting in the Flat File Wizard as you did for the previous file.

Importing the Flat File Data

Import the metadata for both flat files.

Return again to the Summary and Import page and select Finish.

When you select Finish, the wizard imports the data based on the selections you made when sampling the data. The two comma separated files now display under the SALES_EXPENSES module which is under the Files node in OWB_DEMO project.

Figure 3-2 The SOURCE Flat File Module

Description of Figure 3-2 follows
Description of "Figure 3-2 The SOURCE Flat File Module"