Oracle Cost-Based Optimizer (CBO) and Database Statistics (DBMS_STATS)
When a valid SQL statement is sent to the server for the first time, Oracle produces an execution plan that describes how to retrieve the necessary data. In older versions of the database this execution plan could be generated using one of two optimizers:
- Rule-Based Optimizer (RBO) - This was the original optimization method and as the name suggests, was essentially a list of rules Oracle should follow to generate an execution plan. Even after the cost-based optimizer was introduced, this method was used if the server had no internal statistics relating to the objects referenced by the statement, or if explicitly requested by a hint or instance/session parameter. This optimizer was made obsolete, then deprecated in later versions of the database.
- Cost-Based Optimizer (CBO) - The CBO uses database statistics to generate several execution plans, picking the one with the lowest cost, where cost relates to system resources required to complete the operation.
In newer versions of the database the cost-based optimizer is the only option available. If new objects are created, the amount of data or the spread of data in the database changes the statistics will no longer represent the real state of the database so the CBO decision process may be seriously impaired. This article will focus on management of statistics using the
DBMS_STATS package, although there will be some mention of legacy methods.
- Legacy Methods for Gathering Database Stats
- Automatic Optimizer Statistics Collection
- Statistics Collection Enhancements in Oracle Database 11g Release 1
- Dynamic Sampling
If you put 10 Oracle performance gurus in the same room they will all say database statistics are vital for the cost-based optimizer to choose the correct execution plan for a query, but they will all have a different opinion on how to gather those statistics. A couple of quotes that stand out in my mind are:
"You don't necessarily need up to date statistics. You need statistics that are representative of your data." - Graham Wood.
Meaning, the age of the statistics in your system is not a problem as long as they are still representative of your data. So just looking at the
LAST_ANALYZEDcolumn of the
DBA_TABLESview is not an indication of valid stats on your system.
"Do you want the optimizer to give you the best performance, or consistent performance?" - Anjo Kolk
Meaning, regularly changing your stats potentially introduces change. Change is not always a good thing.
Neither of these experts are suggesting you never update your stats, just pointing out that in doing so you are altering information the optimizer uses to determine which execution plan is the most efficient. In altering that information it is not unlikely the optimizer may make a different decision. Hopefully it will be the correct decision, but maybe it wont. If you gather statistics for all tables every night, your system will potentially act differently every day. This is the fundamental paradox of gathering statistics.
So what should our statistics strategy be? Here are some suggestions.
- Automatic Optimizer Statistics Collection: From 10g onward the database automatically gathers statistics on a daily basis. The default statistics job has come under a lot of criticism over the years, but its value depends on the type of systems you are managing. Most of that criticism has come from people discussing edge cases, like large data warehouses. If you are managing lots of small databases that have relatively modest performance requirements, you can pretty much let Oracle do its own thing where stats are concerned. If you have any specific problems, deal with them on a case by case basis.
- Mixed Approach: You rely on the automatic job for the majority of stats collection, but you have specific tables or schemas that have very specific stats requirements. In these cases you can either set the preferences for the objects in question, or lock the stats for the specific tables/schemas to prevent the job from changing them, then devise a custom solution for those tables/schemas.
- Manual: You disable the automatic stats collection completely and devise a custom solution for the whole of the database.
Which one of these approaches you take should be decided on a case-by-case basis. Whichever route you take, you will be using the
DBMS_STATS package to manage your stats.
Regardless of the approach you take, you need to consider system and fixed object statistics for every database, as these are not gathered by the automatic job.
DBMS_STATS package was introduced in Oracle 8i and is Oracle's preferred method of gathering statistics. Oracle list a number of benefits to using it including parallel execution, long term storage of statistics and transfer of statistics between servers.
The functionality of the DBMS_STATS package varies greatly between database versions, as do the default parameter settings and the quality of the statistics they generate. It is worth spending some time checking the documentation relevant to your version.
Table and Index Stats
Table statistics can be gathered for the database, schema, table or partition.
EXEC DBMS_STATS.gather_database_stats; EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15); EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15, cascade => TRUE); EXEC DBMS_STATS.gather_schema_stats('SCOTT'); EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15); EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15, cascade => TRUE); EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES'); EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15); EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15, cascade => TRUE); EXEC DBMS_STATS.gather_dictionary_stats;
ESTIMATE_PERCENT parameter was often used when gathering stats from large segments to reduce the sample size and therefore the overhead of the operation. In Oracle 9i upwards, we also had the option of letting Oracle determine the sample size using the
AUTO_SAMPLE_SIZE constant, but this got a bad reputation because the selected sample size was sometimes inappropriate, making the resulting statistics questionable.
In Oracle 11g, the
AUTO_SAMPLE_SIZE constant is the preferred (and default) sample size as the mechanism for determining the actual sample size has been improved. In addition, the statistics estimate based on the auto sampling are near to 100% accurate and much faster to gather than in previous versions, as described here.
CASCADE parameter determines if statistics should be gathered for all indexes on the table currently being analyzed. Prior to Oracle 10g, the default was FALSE, but in 10g upwards it defaults to
AUTO_CASCADE, which means Oracle determines if index stats are necessary.
As a result of these modifications to the behavior in the stats gathering, in Oracle 11g upwards, the basic defaults for gathering table stats are satisfactory for most tables.
Index statistics can be gathered explicitly using the
EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK'); EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK', estimate_percent => 15);
The current statistics information is available from the data dictionary views for the specific objects (DBA, ALL and USER views). Some of these view were added in later releases.
Histogram information is available from the following views.
Table, column and index statistics can be deleted using the relevant delete procedures.
EXEC DBMS_STATS.delete_database_stats; EXEC DBMS_STATS.delete_schema_stats('SCOTT'); EXEC DBMS_STATS.delete_table_stats('SCOTT', 'EMP'); EXEC DBMS_STATS.delete_column_stats('SCOTT', 'EMP', 'EMPNO'); EXEC DBMS_STATS.delete_index_stats('SCOTT', 'EMP_PK'); EXEC DBMS_STATS.delete_dictionary_stats;
Introduced in Oracle 9iR1, the
GATHER_SYSTEM_STATS procedure gathers statistics relating to the performance of your systems I/O and CPU. Giving the optimizer this information makes its choice of execution plan more accurate, since it is able to weigh the relative costs of operations using both the CPU and I/O profiles of the system.
There are two possible types of system statistics:
Noworkload: All databases come bundled with a default set of noworkload statistics, but they can be replaced with more accurate information. When gathering noworkload stats, the database issues a series of random I/Os and tests the speed of the CPU. As you can imagine, this puts a load on your system during the gathering phase.
Workload: When initiated using the start/stop or interval parameters, the database uses counters to keep track of all system operations, giving it an accurate idea of the performance of the system. If workload statistics are present, they will be used in preference to noworkload statistics.
-- Manually start and stop to sample a representative time (several hours) of system activity. EXEC DBMS_STATS.gather_system_stats('start'); EXEC DBMS_STATS.gather_system_stats('stop'); -- Sample from now until a specific number of minutes. DBMS_STATS.gather_system_stats('interval', interval => 180);
Your current system statistics can be displayed by querying the
SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN'; PNAME PVAL1 ------------------------------ ---------- CPUSPEED CPUSPEEDNW 1074 IOSEEKTIM 10 IOTFRSPEED 4096 MAXTHR MBRC MREADTIM SLAVETHR SREADTIM 9 rows selected. SQL>
If you are running 220.127.116.11 or 18.104.22.168 then check out MOS Note: 9842771.8.
DELETE_SYSTEM_STATS procedure will delete all workload stats and replace previously gathered noworkload stats with the default values.
You only need to update your system statistics when something major has happened to your systems hardware or workload profile.
There are two schools of thought about system stats. One side avoid the use of system statistics altogether, favoring the default noworkload stats. The other side suggests providing accurate system statistics. The problem with the latter, is it is very difficult to decide what represents an accurate set of system statistics. Most people seem to favor investigation of systems using a variety of methods, including gathering system stats into a stats table, then manually setting the system statistics using the
EXEC DBMS_STATS.set_system_stats('iotfrspeed', 4096);
The available parameter names can be found here.
I would say, if in doubt, use the defaults.
Fixed Object Stats
Introduced in Oracle 10gR1, the
GATHER_FIXED_OBJECTS_STATS procedure gathers statistics on the
X$ tables, which sit underneath the
V$ dynamic performance views. The
X$ tables are not really tables at all, but a window on to the memory structures in the Oracle kernel. Fixed object stats are not gathered automatically, so you need to gather them manually at a time when the database is in a representative level of activity.
Major changes to initialization parameters or system activity should signal you to gather fresh stats, but under normal running this does not need to be done on a regular basis.
The stats are removed using the
To prevent statistics being overwritten, you can lock the stats at schema, table or partition level.
EXEC DBMS_STATS.lock_schema_stats('SCOTT'); EXEC DBMS_STATS.lock_table_stats('SCOTT', 'EMP'); EXEC DBMS_STATS.lock_partition_stats('SCOTT', 'EMP', 'EMP_PART1');
If you need to replace the stats, they must be unlocked.
EXEC DBMS_STATS.unlock_schema_stats('SCOTT'); EXEC DBMS_STATS.unlock_table_stats('SCOTT', 'EMP'); EXEC DBMS_STATS.unlock_partition_stats('SCOTT', 'EMP', 'EMP_PART1');
Locking stats can be very useful to prevent automated jobs from changing them. This is especially useful with tables used for ETL processes. If the stats are gathered when the tables are empty, they will not reflect the real quantity of data during the load process. Instead, either gather stats each time the data is loaded, or gather them once on a full table and lock them.
It is possible to transfer statistics between servers allowing consistent execution plans between servers with varying amounts of data. First the statistics must be collected into a statistics table. In the following examples the statistics for the APPSCHEMA user are collected into a new table, STATS_TABLE, which is owned by DBASCHEMA.
EXEC DBMS_STATS.create_stat_table('DBASCHEMA','STATS_TABLE'); EXEC DBMS_STATS.export_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA');
This table can then be transfered to another server using your preferred method (Export/Import, SQL*Plus COPY etc.) and the stats imported into the data dictionary as follows.
EXEC DBMS_STATS.import_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA'); EXEC DBMS_STATS.drop_stat_table('DBASCHEMA','STATS_TABLE');
Since Oracle 10g, many of the default values of parameters for the
DBMS_STATS procedures have changed from being hard coded to using preferences. In Oracle 10g, these preferences could be altered using the
In 11g, theEXEC DBMS_STATS.set_param('DEGREE', '5');
SET_PARAMprocedure was deprecated in favor of a layered approach to preferences. The four levels of preferences are amended with the following procedures.
SET_GLOBAL_PREFS: Used to set global preferences, including some specific to the automatic stats collection job.
SET_DATABASE_PREFS: Sets preferences for the whole database.
SET_SCHEMA_PREFS: Sets preferences for a specific schema.
SET_TABLE_PREFS: Sets preferences for a specific table.
With the exception of
AUTOSTATS_TARGET, which is only available for the global prefs, the same preferences can be set for all four level.
|CASCADE||Determines if index stats should be gathered for the current table (TRUE, FALSE, AUTO_CASCADE).||DBMS_STATS.AUTO_CASCADE||10gR1+|
|DEGREE||Degree of parallelism (integer or DEFAULT_DEGREE).||DBMS_STATS.DEFAULT_DEGREE||10gR1+|
|ESTIMATE_PERCENT||Percentage of rows to sample when gathering stats (0.000001-100 or AUTO_SAMPLE_SIZE).||DBMS_STATS.AUTO_SAMPLE_SIZE||10gR1+|
|METHOD_OPT||Controls column statistics collection and histogram creation.||FOR ALL COLUMNS SIZE AUTO||10gR1+|
|NO_INVALIDATE||Determines if dependent cursors should be invalidated as a result of new stats on objects (TRUE, FALSE or AUTO_INVALIDATE).||DBMS_STATS.AUTO_INVALIDATE||10gR1+|
|AUTOSTATS_TARGET||Determines which objects have stats gathered (ALL, ORACLE, AUTO).||AUTO||10gR2+|
|GRANULARITY||The granularity of stats to be collected on partitioned objects (ALL, AUTO, DEFAULT, GLOBAL, 'GLOBAL AND PARTITION', PARTITION, SUBPARTITION).||AUTO||10gR2+|
|PUBLISH||Determines if gathered stats should be published immediately or left in a pending state (TRUE, FALSE).||TRUE||11gR2+|
|INCREMENTAL||Determines whether incremental stats will be used for global statistics on partitioned objects, rather than generated using table scans (TRUE, FALSE).||FALSE||11gR2+|
|STALE_PERCENT||The percentage of rows that must be changed before a table is considered stale.||10||11gR2+|
The following shows their basic usage.
EXEC DBMS_STATS.set_global_prefs('AUTOSTATS_TARGET', 'AUTO'); EXEC DBMS_STATS.set_database_prefs('STALE_PERCENT', '15'); EXEC DBMS_STATS.set_schema_prefs('SCOTT','DEGREE', '5'); EXEC DBMS_STATS.set_table_prefs('SCOTT', 'EMP', 'CASCADE', 'FALSE');
Global preferences can be reset and the other layers of preferences deleted using the following procedures.
EXEC DBMS_STATS.reset_global_pref_defaults; EXEC DBMS_STATS.delete_database_prefs('CASCADE'); EXEC DBMS_STATS.delete_schema_prefs('SCOTT','DEGREE'); EXEC DBMS_STATS.delete_table_prefs('SCOTT', 'EMP', 'CASCADE');
Setting Stats Manually
DBMS_STATS package provides several procedures for manually setting statistics.
The current stats can be returned using the following procedures.
Be careful when setting stats manually. Possibly the safest approach is to get the current values, amend them as required, then set them. An example of setting column statistics is shown below.
SET SERVEROUTPUT ON DECLARE l_distcnt NUMBER; l_density NUMBER; l_nullcnt NUMBER; l_srec DBMS_STATS.StatRec; l_avgclen NUMBER; BEGIN -- Get current values. DBMS_STATS.get_column_stats ( ownname => 'SCOTT', tabname => 'EMP', colname => 'EMPNO', distcnt => l_distcnt, density => l_density, nullcnt => l_nullcnt, srec => l_srec, avgclen => l_avgclen); -- Amend values. l_srec.minval := UTL_RAW.cast_from_number(7369); l_srec.maxval := UTL_RAW.cast_from_number(7934); -- Set new values. DBMS_STATS.set_column_stats ( ownname => 'SCOTT', tabname => 'EMP', colname => 'EMPNO', distcnt => l_distcnt, density => l_density, nullcnt => l_nullcnt, srec => l_srec, avgclen => l_avgclen); END; /
- Exclude dataload tables from your regular stats gathering, unless you know they will be full at the time that stats are gathered.
- Prior to 10g, gathering stats for the SYS schema can make the system run slower, not faster.
- Gathering statistics can be very resource intensive for the server so avoid peak workload times or gather stale stats only.
- Even if scheduled, it may be necessary to gather fresh statistics after database maintenance or large data loads.
Legacy Methods for Gathering Database StatsThe information in this section is purely for historical reasons. All statistics management should now be done using the
The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows.
ANALYZE TABLE employees COMPUTE STATISTICS; ANALYZE INDEX employees_pk COMPUTE STATISTICS; ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS; ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
DBMS_UTILITY package can be used to gather statistics for a whole schema or database. Both methods follow the same format as the analyze statement.
EXEC DBMS_UTILITY.analyze_schema('SCOTT','COMPUTE'); EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_rows => 100); EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_percent => 15); EXEC DBMS_UTILITY.analyze_database('COMPUTE'); EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_rows => 100); EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_percent => 15);
Refreshing Stale StatsThis involves monitoring the DML operations against individual tables so statistics are only gathered for those tables whose data has changed significantly. This is the default action for the automatic optimizer statistics collection in 10g and above, but if you are using an older version of the database, you may want to read more about this here.
Prior to Oracle 10g, scheduling the gathering of statistics using the
DBMS_JOB package ws the easiest way to make sure they were always up to date.
SET SERVEROUTPUT ON DECLARE l_job NUMBER; BEGIN DBMS_JOB.submit(l_job, 'BEGIN DBMS_STATS.gather_schema_stats(''SCOTT''); END;', SYSDATE, 'SYSDATE + 1'); COMMIT; DBMS_OUTPUT.put_line('Job: ' || l_job); END; /
The above code sets up a job to gather statistics for SCOTT for the current time every day. You can list the current jobs on the server
Existing jobs can be removed using the following.
EXEC DBMS_JOB.remove(X); COMMIT;
Where 'X' is the number of the job to be removed.
For more information see:
- Automatic Optimizer Statistics Collection
- Statistics Collection Enhancements in Oracle Database 11g Release 1
- Dynamic Sampling
- Refreshing Stale Statistics
- Best Practices for Gathering Optimizer Statistics
- DBMS_STATS (8i)
- DBMS_STATS (9iR1)
- DBMS_STATS (9iR2)
- DBMS_STATS (10gR1)
- DBMS_STATS (10gR2)
- DBMS_STATS (11gR1)
- DBMS_STATS (11gR2)
Hope this helps. Regards Tim...