This chapter discusses the different ways that you can use Oracle Database Quality of Service Management (Oracle Database QoS Management) to manage the workload on your system and the best practices for configuring your system to effectively use Oracle Database QoS Management.
This chapter contains the following sections:
To use Oracle Database QoS Management, your system must meet certain requirements. Also, your applications and database connections must conform to certain standards. If you do not use a supported configuration, then Oracle Database QoS Management reports a configuration violation and is disabled.
The following section discuss the configuration requirements for using Oracle Database QoS Management:
Oracle Database QoS Management requires that your cluster and databases use server pools. Databases in the Generic pool cannot be managed by Oracle Database QoS Management and are not visible in the user interface. Servers in the Free pool can be used to provide additional resources to managed server pools.
Caution:
By default, any named user may create a server pool. To restrict the operating system users that have this privilege, Oracle strongly recommends that you add specific users to the CRS Administrators list. See Oracle Clusterware Administration and Deployment Guide for more information about adding users to the CRS Administrators list.Before Oracle Database QoS Management can manage the resources for your cluster, you must mark server pools as managed in your Performance Policy. If a server pool is not managed by Oracle Database QoS Management, then the server pool is not visible to the Oracle Database QoS Management server. Any workloads or resources in the unmarked server pools are not managed by Oracle Database QoS Management.
You can select server pools to be managed by Oracle Database QoS Management even if they do not contain any database instances, or have a current size of zero. This enables you to preconfigure a server pool for management by Oracle Database QoS Management before the workload is started. Also, if you configure a server pool to have a minimum size of zero, then Oracle Database QoS Management can remove the servers from that server pool to provide resources for higher priority workloads. In this case, even though there are no resources within the server pool for Oracle Database QoS Management to manage, the server pool any workloads that run in that server pool are still monitored and, and if possible, resources are allocated to that server pool to support the workloads.
Oracle Database QoS Management only works with Oracle RAC databases of release 11.2.0.2 or higher. The databases must be configured as policy-managed databases that run in server pools. The underlying server pools used by the database instances must be marked as managed by Oracle Database QoS Management.
Database services should be created as UNIFORM
services, meaning the service is offered by every available instance running in the specified server pool. If your application requires a SINGLETON
service, then, to use Oracle Database QoS Management, the service must run in a server pool that has a maximum size of one. If you use a SINGLETON
service in a server pool that has a maximum size greater than one, then Oracle Database QoS Management reports a configuration violation.
Oracle Database QoS Management supports multiple databases sharing a server pool. If you have multiple databases using the same server pool, then every database that uses the server pool must have Oracle Database QoS Management enabled. Oracle Database QoS Management also supports Oracle RAC One Node databases (sometimes referred to as singleton databases), but these databases must use server pools that have a maximum size of one.
When you create a database, the default value of the CPU_COUNT
initialization parameter for the database instance is set to the value of the number of physical CPUs on each node that the instance runs on. If you have multiple database instances on the same node, then you must adjust the value of CPU_COUNT
for each instance so that the sum of CPU_COUNT
for each instance that runs on the node is equal to or less than the physical number of CPUs on that node. Also, the value of CPU_COUNT
must be the same for every instance of a database. For example, for the sales
database, you cannot have CPU_COUNT
set to four for the sales1
instance and CPU_COUNT
set to two for the sales2
instance if both instances are in the same server pool.
Database services that are managed by Oracle Clusterware are required for Oracle Database QoS Management. All workloads managed by Oracle Database QoS Management must connect to the database using a database service that is managed by Oracle Clusterware. You cannot use the default database service; the default database service is not managed by Oracle Clusterware.
The services used to connect to the database should be UNIFORM
. If your application requires a SINGLETON
service, then, to use Oracle Database QoS Management, the service must run in a server pool that has a maximum size of one.
The Oracle RAC high availability framework monitors the database and its services and sends event notifications using fast application notification (FAN). Oracle Clusterware and Oracle Net Services provide load balancing for services according to rules specified in the service configuration. These rules are:
The connection load balancing goal: Connections are routed to an instance using the current workload for that instance and the type of connection (LONG
or SHORT
) to determine which instance can provide the best performance. For Oracle Database QoS Management, the connection load balancing goal should be set to LONG
, so that if a new server is allocated to a server pool for a service, new connections will migrate faster to the new server. By migrating connections more quickly to a new server, the workload is balanced faster across all the available instances and the response times for workloads improve at a faster rate.
The run-time connection load balancing goal: The load balancing advisory data is used to determine which instance best meets the goal specified for the service. The two goals are SERVICE_TIME
, for which load balancing advisory data is based on elapsed time for work done in the instance, and THROUGHPUT
, for which the load balancing advisory data is based on the rate that work is completed in the instance. For Oracle Database QoS Management, the run-time connection load balancing goal should be set to SERVICE_TIME
for all database services that use server pools except optionally those with a maximum size of one.
Run-time connection load balancing sends advice to connection pools on how to balance connection requests across instances in an Oracle RAC database. The load balancing advisory also provides advice about how to direct incoming work to the instances that provide the optimal quality of service for that work. This minimizes the need to relocate the work later.
To configure the load balancing goals for a service, use the Server Control (SRVCTL) utility, as shown in the following example, or use Enterprise Manager:
srvctl modify service -d db_name -s service_name -B SERVICE_TIME -j LONG
In the initial release of Oracle Database QoS Management, only online transaction processing (OLTP) workloads are supported. The only supported Performance Objective is average response time for database requests. Oracle Database QoS Management is designed to manage open workloads, or a workload in which demand is independent of the response time.
The database requests for your application workload must have an average response time of less than one second, and preferably, an average response time of less than 0.5 seconds. Each database request within a Performance Class should be homogenous with respect to resource usage. If a subset of the database requests in a workload use significantly more resources than the other requests, then you should create a new Performance Class to contain the database requests that require more resources. See "Deciding to Create New Performance Classes".
Oracle Database QoS Management does not support workloads that involve parallel queries. By default, parallel queries run on all available instances of the database, regardless of which service was used to connect to the database. This means that the workload is not contained, or restrained to running on only those instances that offer the service. For a similar reason, Oracle Database QoS Management does not support workloads that involve a significant amount of database requests involving queries to GV$ views.
For a workload to be managed by Oracle Database QoS Management, the database connections must use a database service that is managed by Oracle Clusterware. The client or application that initiates the connection must be a JDBC (thick or thin) client, or an OCI client. Workloads that use a bequeath connection to the database are not managed by Oracle Database QoS Management.
Currently, the only workloads that Oracle Database QoS Management manages are OLTP database workloads. To manage the workload for a database, the incoming work requests must be assigned to a Performance Class. Workload is mapped to a Performance Class using classifiers.
In multi-tier environments, a request from a client is routed to different database sessions by the middle tier or through load balancing, making the tracking of a client across database sessions difficult. Classifiers use session attributes to identify work requests. The attributes used are service name, user name, module, action, and program. See "Using Additional Filters for Classifying Work Requests".
Each classifier must specify one or more service names. If a classifier specifies multiple service names, then when matching the connection data to a Performance Class, the service names are evaluated using an OR
operation. This means that if any one of the service names specified in the classifier matches the service name in the work request, then the comparison evaluates to TRUE
.
To set the MODULE
and ACTION
attributes, use the OCIAttrSet()
call. Use the default namespace, USERENV
, for the application context.
You can also optionally include the UserName and program name in the classifier. The user name is the name of the database user to which the session connects. The program attribute is the name of the client program used to log in to the server.
If the classifier for a Performance Class specifies multiple attributes, then the session attributes are combined using an AND
operation. This means that if all of the attribute values specified in the classifier match the session attribute values in the work request, then the comparison evaluates to TRUE
. If you have more than one classifier that uses similar attribute values, then place the classifier with the most fine-grained conditions first. See "Applying Classifiers to Work Requests".
For example, consider the following classifiers:
create_invoice_taxes_pc
that specifies the sales_cart
service, the ORDER
module, and the CALCULATE TAX
action
create_invoice_pc
, which specifies the sales_cart
service and the ORDER
module
The create_invoice_taxes_pc
classifier should be evaluated before the create_invoice_pc
classifier. If a work request uses the sales_cart
service, and is performing the CALCULATE TAX
action in the ORDER
module, then the work request is assigned to the create_invoice_taxes_pc
. If the work request does not have the matching values for all the attributes, then the work request is compared to the next classifier, for create_invoice_pc
. If you evaluate the create_invoice_pc
classifier first, then any work request that uses the sales_cart
service and the ORDER
module will be assigned to the create_invoice_pc
Performance Class, regardless of what action the work request performs.
You can create up to 47 Performance Classes for a cluster. If you have more than 47 services for your cluster, then use more than one service name within classifiers. Once a match is found for a classifier, Oracle Database QoS Management assigns a tag to the matching work request. The tag is based on the classifier that evaluates to TRUE
. See "Performance Class Tags".
This section discusses key configuration recommendations and requirements for systems that are managed by Oracle Database QoS Management. The topics included in this section are:
Configuration Recommendations for Global Cache Resource Bottlenecks
Configuration Recommendations for Other Types of Bottlenecks
Oracle Database QoS Management measures use and wait times for CPU, Global Cache, I/O, and other resources to determine where a bottleneck is located. The target Performance Class and its bottle-necked resource are identified on the Oracle Database QoS Management Dashboard (the Dashboard), however, only the CPU resource is actively managed in this release. The following sections describe how Oracle Database QoS Management responds to bottlenecks in each resource type and what configuration strategies are recommended.
A CPU resource bottleneck is detected when there are excessive wait times on the collection of CPU queues running that workload. Oracle Database QoS Management offers recommendations you can implement to relieve the bottleneck.
One solution to this type of bottleneck is to increase the number of opportunities for the workload to run on the CPU. Oracle Database QoS Management implements this solution by assigning the workload to a consumer group that has more CPU shares across the server pool.
Another solution is to provide more CPU resources. If you have multiple instances sharing the CPU resources for each server in the server pool, and you have implemented instance caging, then Oracle Database QoS Management can suggest altering the CPU counts for the instances in the server pool; this solution gives more CPU resources to the workloads that are not meeting performance expectations by taking CPU resources away from an instance that is of lower rank or has the headroom to contribute the resources.
If there is a CPU resource bottleneck that cannot be relieved by adjusting the CPU counts between instances, then Oracle Database QoS Management can recommend moving a new server into the server pool. The server can come from the Free pool, from a less-stressed server pool, or from a server pool that hosts a less critical workload.
A Global Cache resource bottleneck is detected when there is excessive data block movement between database instances. This is usually caused by an application that is not configured properly or is not able to scale horizontally. Configuring the application to run in a server pool with a maximum size of one or partitioning the data can usually relieve the bottleneck.
Oracle Database QoS Management cannot perform either of these actions in this release and does not provide a recommendation that can be implemented for this type of bottleneck.
An I/O resource bottleneck is detected when there are excessive wait times on the storage subsystem. This type of bottleneck is typically caused by either too few disk spindles or not enough network bandwidth on the storage interconnect. To resolve this bottleneck, spread the database files across a higher number of disks, or configure a separate network interface card (NIC) for a dedicated storage interconnect.
Oracle Database QoS Management cannot resolve this type of bottleneck in this release and does not provide a recommendation that can be implemented.
The last resource type used to categorize bottlenecks, Other, is used for all other wait times. These database wait times are usually caused by SQL performance issues that result from an application that is not optimized, waiting on latches, and so on. These bottlenecks can be investigated using Oracle Database tuning tools such as Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM).
Resolving these types of bottlenecks are outside the scope of the run-time system management provided by Oracle Database QoS Management and Oracle Database QoS Management does not provide any recommendations that can be implemented
This section describes a sample implementation of Oracle Database QoS Management. The process by which Oracle Database QoS Management manages performance is described. This section contains the following topics:
The sample implementation uses a four-node cluster running on Linux. The nodes are named test_rac1
to test_rac4
. In normal operation, each node does the following:
Node | Purpose | Services |
---|---|---|
test_rac1 | Runs Oracle Grid Infrastructure for a cluster and the first database instance for the backoffice database |
HR and ERP |
test_rac2 | Runs Oracle Grid Infrastructure for a cluster and the second database instance for the backoffice database |
HR and ERP |
test_rac3 | Runs Oracle Grid Infrastructure for a cluster and the first database instance for the online database |
Sales and Sales_Cart |
test_rac4 | Runs Oracle Grid Infrastructure for a cluster and the second database instance for the online database |
Sales and Sales_Cart |
The cluster is logically divided into two server pools with the following constraints:
Name | Min Size | Max Size | Current Size | Importance |
---|---|---|---|---|
backoffice | 1 | -1 | 2 | 1 |
online | 1 | -1 | 2 | 2 |
Free | 0 | -1 | 0 | 0 |
The server pool constraints as shown here guarantee that at least one server is allocated to each of the server pools (and the databases that run in those server pools) and the remaining servers can be shared on a demand basis to manage service levels. The online
server pool hosts the most critical workloads, because it has the highest value for Importance. If a server failure occurs, then maintaining the minimum size for the online
server pool takes priority over maintaining the minimum size of the other server pools.
This release of Oracle Database QoS Management focuses on managing OLTP workloads, which are the type most likely to have an open workload (workloads for which demand remains constant even as system performance degrades) and be vulnerable to outages due to workload surges. For this demonstration, we assume there is a combination of internal and external workloads hosted in the same cluster so the resources can be shared.
There are four types of workloads demonstrated for this demo system, as illustrated in Figure 2-1:
An ERP application based on J2EE that connects to the database instances in the backoffice
server pool using the ERP
service
An internal HR application based on Oracle C Interface (OCI) that connects to the database instances in the backoffice
server pool using the HR
service
An external Sales application based on J2EE that connects to the database instances in the online
server pool using the Sales
service
An external Sales checkout application (Sales Cart) based on J2EE that connects to database instances through a specific database user in the online
server pool using the Sales
service
Figure 2-1 Illustration of a Sample Workload
By using two server pools the workloads and their dependent databases are logically separated but can readily share resources between them.
At first, there is no Oracle Database QoS Management configured for this system. Using Oracle Enterprise Manager Database Control, there are two configuration workflows to complete to enable Oracle Database QoS Management for the cluster. The first workflow configures each database for Oracle Database QoS Management and the second workflow configures and enables Oracle Database QoS Management for the cluster.
See Also:
"Enabling Oracle Database QoS Management" for details on enabling Oracle Database QoS ManagementAfter you create a default Policy Set, using the database services that are discovered automatically, Oracle Database QoS Management can be fine-tuned to align the workloads with their respective service-level agreements or objectives.
In this section, the sample implementation of Oracle Database QoS Management is further evolved to include creating and activating Performance Policies and refining them with additional Performance Classes.
Because the default Performance Policy is created by discovering the database services in measure-only mode, the default Performance Policy can initially be activated to test how all of the workloads perform in the cluster. The Dashboard displays both the resource use and wait times that comprise the average response time for each Performance Class during different periods of demand. These numbers can serve to help understand the minimum response times achievable with the allocated resources
If your workloads peak at different times on a regular basis or your service-level agreements (SLAs) are variable based upon time, day of week, and so on, then create additional measure-only Performance Policies that change the size of the server pools to evaluate the minimum resources required for your workloads. In this demonstration, for the Sales application, the workload that uses the online
server pool requires a minimum of two servers. The backoffice
server pool requires only one server to satisfy the workload requests. If both server pools currently contain two servers, then you can enable the online
server pool to take a server from the backoffice
server pool, if needed, by setting the minimum size of the backoffice
server pool to one. You would use a server pool directive override in the "Business Hours" Performance Policy to specify the minimum size of one for the backoffice
server pool.
You could interpret the minimum size of a server pool as the number of servers owned by that server pool. If the sum of the minimum sizes of all the server pools in the cluster is less than the number of servers in the cluster, then the extra servers are referred to as floaters, which are shared by all server pools. For example, if your cluster has 15 servers, three server pools, and a minimum size of four for each server pool, then your system has three floaters.
After the Performance Policies have been run in measure-only mode, Performance Objectives can be added to each Performance Class. The Performance Objectives can be ranked based upon how critical the maintenance of that Performance Objective is to your business. Performance Objectives should be set to maximize headroom above the observed response times but below the response times required to meet SLAs. Maintaining at least 50% headroom is a good starting point to support trading off resources should a Performance Class experience a workload surge. For example, if a Performance Class has an average response time of two milliseconds (ms), then the Performance Objective could be set to three ms: two ms response time and an additional one ms which corresponds to the 50% headroom.
Although service-based classifiers can provide for easy configuration, you might want to define more than one Performance Objective for a service. For example, the sales
service can contain many different workloads, such as Browse Products, Add Customer Account, Sales Cart and Browse Orders. Because the Sales Cart workload generates revenue, you might want this workload to have a shorter response time than the other workloads. You must create a separate Performance Class and associated classifiers to specify specific Performance Objectives for the different workloads.
On the Define Classifier page in the Policy Set wizard, a sales cart performance classifier can be defined by entering sales
as the Service Name and if the application can set MODULE
or ACTION
, enter an appropriate value, otherwise configure a separate USERNAME
from the middle tier. As soon as this new Performance Class is defined, the Performance Class appears automatically in all of the Performance Policies in measure-only mode. The new Performance Class is given the lowest rank by default. Use these values initially to test the performance of your system. After the average performance levels can be determined, a Performance Objective and rank for this Performance Class can be set within each Performance Policy.
The implementation of Oracle Database QoS Management is completed by actively managing the service levels, which means responding to alerts, reviewing and implementing recommendations, and tracking results. This section describes the actions you would perform on the demo system.
After all the workloads run and the Dashboard displays the performance of the demo system, you need to be alerted should a workload surge or failure cause a Performance Objective to stop being met. The Performance Satisfaction Metric (PSM) normalizes all of the objectives and provides a quick way to observe the health of the system. By observing the PSM Trend indicator you can see how well a Performance Class is meeting its objective over the last five minutes, and problems can be observed. Performance Objective violations produce recommendations that state how resources should be reallocated to relieve the bottleneck. Details and projections are available for further analysis of the bottleneck and possible solutions. If the recommendation is an action that can be implemented by Oracle Database QoS Management, then an Implement button is displayed.
Performance Objective violations of short duration are tolerated in most SLAs. Therefore, Enterprise Manager alerts can be configured by Performance Class specifying the duration of continuous violation. These alerts are configured on the Database alert page, but can be defined for all Performance Classes in the cluster.
An audit log of policy changes, violations and actions is available in the Oracle Grid Infrastructure home in the oc4j/j2ee/home/log/dbwlm/auditing
directory on the server that hosts the Oracle Database QoS Management server. To determine which server is hosting the Oracle Database QoS Management server (the OC4J container), enter the following command at the operating system prompt:
srvctl status oc4j