8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 11g » Here

Statistics Collection Enhancements in Oracle Database 11g Release 1

Related articles.

Pending Statistics

In previous database versions, new optimizer statistics were automatically published when they were gathered. In 11g this is still the default action, but you now have the option of keeping the newly gathered statistics in a pending state until you choose to publish them.

The DBMS_STATS.GET_PREFS function allows you to check the 'PUBLISH' attribute to see if statistics are automatically published. The default value of TRUE means they are automatically published, while FALSE indicates they are held in a pending state.

SELECT DBMS_STATS.get_prefs('PUBLISH') FROM dual;

DBMS_STATS.GET_PREFS('PUBLISH')
-------------------------------------------
TRUE

1 row selected.

SQL>

The 'PUBLISH' attribute is reset using the DBMS_STATS.SET_TABLE_PREFS procedure.

-- New statistics for SCOTT.EMP are kept in a pending state.
EXEC DBMS_STATS.set_table_prefs('SCOTT', 'EMP', 'PUBLISH', 'false');

-- New statistics for SCOTT.EMP are published immediately.
EXEC DBMS_STATS.set_table_prefs('SCOTT', 'EMP', 'PUBLISH', 'true');

Pending statistics are visible using the [DBA|ALL|USER]_TAB_PENDING_STATS and [DBA|ALL|USER]_IND_PENDING_STATS views.

The DBMS_STATS package allows you to publish or delete pending statistics, as show below.

-- Publish all pending statistics.
EXEC DBMS_STATS.publish_pending_stats(NULL, NULL);

-- Publish pending statistics for a specific object.
EXEC DBMS_STATS.publish_pending_stats('SCOTT','EMP');

-- Delete pending statistics for a specific object.
EXEC DBMS_STATS.delete_pending_stats('SCOTT','EMP');

The optimizer is capable of using pending statistics if the OPTIMIZER_USE_PENDING_STATISTICS initialization parameter, which defaults to FALSE, is set to TRUE. Setting this parameter to TRUE at session level allows you to test the impact of pending statistics before publishing them.

ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS=TRUE;

Pending statistics can be transfered between database by exporting this using the DBMS_STATS.EXPORT_PENDING_STATS procedure.

Extended Statistics

Multi-Column (Column Group) Statistics

This feature has altered in later releases. Please check the article relevant to your version.

Individual column statistics are fine for working out the selectivity of a specific column in a where clause, but when the where clause includes multiple columns from the same table, the individual column statistics provide no indication of the relationship between the columns. This makes working out the selectivity of the column group very difficult.

Oracle uses workload analysis to generate column groups, but they can also be manipulated manually using the DBMS_STATS package. The CREATE_EXTENDED_STATS procedure is used to explicitly create multi-column statistics.

-- Create a columnn group based on EMP(JOB,DEPTNO).
SET SERVEROUTPUT ON
DECLARE
  l_cg_name VARCHAR2(30);
BEGIN
  l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
                                                tabname   => 'EMP',
                                                extension => '(JOB,DEPTNO)');
  DBMS_OUTPUT.put_line('l_cg_name=' || l_cg_name);
END;
/
l_cg_name=SYS_STU3VG629OEYG6FN0EKTGV_HQ6

PL/SQL procedure successfully completed.

SQL>

The column group name is returned using the SHOW_EXTENDED_STATS_NAME function.

-- Display the name of the columnn group.
SELECT DBMS_STATS.show_extended_stats_name(ownname   => 'SCOTT',
                                           tabname   => 'EMP',
                                           extension => '(JOB,DEPTNO)') AS cg_name
FROM dual;

CG_NAME
------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6

1 row selected.

SQL>

Manually created column groups can be deleted using the DROP_EXTENDED_STATS procedure.

-- Drop the columnn group.
BEGIN
  dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
                                 tabname   => 'EMP',
                                 extension => '(JOB,DEPTNO)');
END;
/

PL/SQL procedure successfully completed.

SQL>

Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather statistics on all existing column groups for the specified object.

BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for all columns size auto');
END;
/

Alternatively, set the METHOD_OPT parameter to "FOR COLUMNS (column-list)" and the group will automatically be created during the statistics gathering.

BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for columns (job,mgr)');
END;
/

The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the multi-column statistics.

COLUMN extension FORMAT A30

SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';

EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STULPA1A#B6YL4KQ59DQO3OADQ ("JOB","MGR")

2 rows selected.

SQL>


COLUMN col_group FORMAT A30

SELECT e.extension col_group,
       t.num_distinct,
       t.histogram
FROM   dba_stat_extensions e
       JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';

COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 FREQUENCY
("JOB","MGR")                             8 FREQUENCY

2 rows selected.

SQL>

Expression Statistics

The optimizer has no idea what the affect of applying a function to column has on the selectivity of the column. Using a similar method to multi-column statistics, we can gather expression statistics to provide more information.

Expression statistics can be created explicitly using the CREATE_EXTENDED_STATS procedure, or implicitly by specifying the expression in the METHOD_OPT parameter of the GATHER_% procedures when gathering statistics.

DECLARE
  l_cg_name VARCHAR2(30);
BEGIN
  -- Explicitly created.
  l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
                                                tabname   => 'EMP',
                                                extension => '(LOWER(ENAME))');

  -- Implicitly created.
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for columns (upper(ename))');
END;
/

Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather existing expression statistics.

BEGIN
  DBMS_STATS.gather_table_stats(
    'SCOTT',
    'EMP',
    method_opt => 'for all columns size auto');
END;
/

The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the expression statistics, as well as the multi-column statistics.

COLUMN extension FORMAT A30

SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';


EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STULPA1A#B6YL4KQ59DQO3OADQ ("JOB","MGR")
SYS_STU2JLSDWQAFJHQST7$QK81_YB (LOWER("ENAME"))
SYS_STUOK75YSL165W#_X8GUYL0A1X (UPPER("ENAME"))

4 rows selected.

SQL>


COLUMN col_group FORMAT A30

SELECT e.extension col_group,
       t.num_distinct,
       t.histogram
FROM   dba_stat_extensions e
       JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';

COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 NONE
("JOB","MGR")                             8 NONE
(LOWER("ENAME"))                         14 NONE
(UPPER("ENAME"))                         14 NONE

4 rows selected.

SQL>

Expression statistics are dropped using the DROP_EXTENDED_STATS procedure.

-- Drop the columnn group.
BEGIN
  dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
                                 tabname   => 'EMP',
                                 extension => '(UPPER(ENAME))');
END;
/

PL/SQL procedure successfully completed.

SQL>

Enhanced Statistics Collection for Partitioned Objects

Oracle 11g includes improvements to statistics collection for partitioned objects so untouched partitions are not rescanned. This significantly increases the speed of statistics collection on large tables where some of the partitions contain static data. Where partition exchange load (PEL) is used to add data to the a table, only the newly added partition must be scanned to update the global statistics.

In previous versions, aggregated stats were used to reduce the overhead of gathering stats on partitioned objects. This worked fine for the number of rows in the table and the high/low values for columns, but was terrible at calculating the number of distinct values for the columns.

Oracle 11g now keeps an abbreviated record of the high/low column values, called a synopsis, and stores them in the WRI$_OPTSTAT_SYNOPSIS_HEAD$ and WRI$_OPTSTAT_SYNOPSIS$ views. This allows the new incremental stats to estimate the number of distinct values much more accurately.

To initiate incremental statistics for a partitioned table, the INCREMENTAL preference must be set to TRUE. If you have altered the default preferences, make sure the GRANULARITY and ESTIMATE_PERCENT preferences for the specific table are reverted the defaults.

-- Mandatory
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'INCREMENTAL', 'TRUE');

-- Resetting to defaults values if you've changed the global/database preferences.
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'GRANULARITY', 'AUTO');
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);

The stats will now be gathered incrementally for the partitioned table by issuing the basic gather command.

-- Using default preferences.
EXEC DBMS_STATS.gather_table_stats('MY_SCHEMA', 'MY_TABLE');

-- Overriding preferences.
EXEC DBMS_STATS.gather_table_stats('MY_SCHEMA', 'MY_TABLE', granularity => 'AUTO', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

AUTO_SAMPLE_SIZE

In previous releases of the database, the statistics for large tables were often determined using a sample of the rows to reduce the overhead of the operation.

EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => 5);

In Oracle 9i upwards, we also had the option of letting Oracle determine the sample size using the AUTO_SAMPLE_SIZE constant.

EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => DBMS_STATS.auto_sample_size);

Using the AUTO_SAMPLE_SIZE constant in previous versions got a bit of a bad reputation. The selected sample size was sometimes inappropriate, making the resulting statistics questionable.

In Oracle 11g, the AUTO_SAMPLE_SIZE constant is the preferred (and default) sample size as the mechanism for determining the actual sample size has been improved. In addition, the statistics estimate based on the auto sampling are near to 100% accurate and much faster to gather than in previous versions, as described here.

EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => DBMS_STATS.auto_sample_size);

For more information see:

Hope this helps. Regards Tim...

Back to the Top.