In the first part, I explained that Incremental Statistics are designed to allow a partitioned tables Global Statistics to be updated based on a combination of
1) Statistics gathered by analysing the contents of one or more partitions that have just been loaded or have been updated (and see this blog post for more depth on what 'updated' means!)
2) The statistics of existing partitions which are represented by synopses that are already stored in the SYSAUX tablespace.
Using this combination, we can avoid scanning every partition in the table every time we want to update the Global Stats which is an expensive operation that is likely to be unfeasible every time we load some data into a large table.
For me, the key word here is Incremental. Global Statistic updates are an incremental process, building on previous statistics (represented by the synopses) and only updating the Global Statistics based on the changes introduced by loading new partitions.
Understanding this might clear up another area of confusion I keep coming across. After upgrading their database to 11g, people often want to try out Incremental Global Stats on one of their existing large tables because they've always struggled to keep their Global Stats consistent and up to date. Maybe it's just the sites I work at but I'd say this is the most popular use case. Incrementals for a planned large partitioned table in your new systems might be a sensible idea, but there are a lot more existing systems out there with Global Stats collection problems that people have struggled with for years.
Most people I've spoken to initially had the impression that they simply flick the INCREMENTAL switch and perhaps modify some of the parameters to their existing DBMS_STATS calls so that GRANULARITY is AUTO and they use AUTO sampling sizes. All of which is discussed in the various white papers and blog posts out there.
Then they get a hell of a surprise when the very first gather runs for ages! How long is ages? I don't know in your particular case but I've seen this running for hours and hour and hours and people are crying in to their keyboards wondering why something that was supposed to make things run more quickly is so much slower than their usual stats calls.
The best way I've found to explain this phenomenon is to concentrate on the synopses that describe the existing partitions. Where do you think they come from? How are they calculated and populated if you don't ask Oracle to look at the existing data in your enormous table? That's what needs to happen. In order to make future updates to your Global Stats much more efficient, we first need to establish the baseline describing your existing data that Oracle will use as the foundation for the later incremental updates.
Generating the synopses as the baseline for future improvements will be a relatively painful for the largest tables (if it wasn't, you probably wouldn't be so interested in Incrementals ), but it does only have to happen once. You just need to understand that it does have to happen and plan for it as part of your migration.
My personal suggestion is usually to just delete all of the existing stats and start from scratch with modern default parameter values and tidy up any stats-related junk that might be lingering around large, critical tables. Painful but probably worth it!
In the first part, I explained that Incremental Statistics are designed to allow a partitioned tables Global Statistics to be updated based on a combination of
The Oracle 12c client does not have a ocijdbc11.dll, so when you try to login with your 12c Oracle Client (oci thick-driver), you may see :
no ocijdbc11 in path
Don’t worry, this boils down to the jdbc driver ojdbc6.jar. Just overwrite your sqldeveloper/jdbc/lib/ojdbc6.jar with the one from your 12c client.
The same applies to the 126.96.36.199 client, whereas there you will get a core dump at oracle.jdbc.driver.T2CConnection.t2cCreateState instead of a proper error message.
I post a thread on https://community.oracle.com/message/12293761
Sitting in an airport, waiting for a plane, I decided to read a note (pdf) about Flashback data archive written by Beat Ramseier from Trivadis. I’d got about three quarters of the way through it when I paused for thought and figured out that on the typical database implementation something nasty is going to happen after approximately 3 years and 9 months. Can you guess why ?
It’s all about smon_scn_time – which normally records one row every five minutes (created by smon) with a continuous cycle of 24 hours – typically giving you about 1,440 rows in the table. The table is in a cluster, and the cluster key is the instance (thread) number. Clearly this was originally a clever idea from someone who realised that a cluster key of thread number would be beneficial if you had a RAC system with multiple instances – each instance gets its own blocks and the data for any one instance is as well clustered as possible.
The trouble is, when you enable flashback data archive smon no longer sticks to a 24 hour cycle, it just keeps adding rows. Now on my 8KB block tablespace I see 6 rows per block in the table/cluster – which means I get through 48 blocks per days, 17,520 blocks per year, and in 3 years and 9 months I’ll get to roughly 65,700 blocks – and that’s the problem. An index entry in a cluster index points to a chain of cluster blocks, and the last two bytes of the “rowid” in the index entry identify which block within the chain the cluster key scan should start at – and two bytes means you can only have 65,536 blocks for a single cluster key.
I don’t know what’s going to happen when smon tries to insert a row into the 65,535th (-ish) block for the current thread – but it ought to raise an Oracle error, and then you’ll probably have to take emergency action to make sure that the flashback mechanisms carry on running.
Although oraus.msg indicates that it’s an error message about hash clusters it’s possible that the first sight will be: Oracle error: “ORA-02475 maximum cluster chain block count of %s has been exceeded”. If you’re using a 16KB block size then you’ve got roughly 7.5 years, and 32KB block sizes give you about 15 years (not that that’s a good argument for selecting larger block sizes, of course.)
Searching MoS for related topics (smon_scn_time flashback) I found doc ID: 1100993.1 from which we could possibly infer that the 1,440 rows was a fixed limit in 10g but that the number of rows allowed in smon_scn_time could increase in 11g if you enable automatic undo management. I also found a couple of bugs relating to index or cluster corruption – fixed by 188.8.131.52, though.
I just stumbled across this and could not find it anywhere else on the net. I set up a ZFS Appliance with Oracle VM and their storageconnect plugin according to the documentation pdf (which are pretty easy step-by-step instructions) but in this case the OVM Server and the ZFS Appliance were not in the same network and access is denied by default in the firewall between those nets. So trying to register the appliance as a FC Storage led to this error that just tells me that the connection timed out.
(03/03/2014 04:30:38:233 PM) OVMAPI_B000E Storage plugin command [storage_plugin_validate] failed for storage server [0004fb0000090000ddffb1ffb5081da6] failed with [com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: storage_plugin_validate to server: ovm42 failed. OVMAPI_4004E Server Failed Command: storage_plugin_validate oracle.s7k. SCSIPlugin.SCSIPlugin, Status: OSCPlugin.OperationFailedEx:'URL failure: Connection timed out' [Mon Mar 03 16:30:38 CET 2014] [Mon Mar 03 16:30:38 CET 2014]] OVMAPI_4010E Attempt to send command: storage_plugin_validate to server: ovm42 failed. OVMAPI_4004E Server Failed Command: storage_plugin_validate oracle.s7k.SCSIPlugin.SCSIPlugin, Status: org.apache.xmlrpc.XmlRpcException: OSCPlugin.OperationFailedEx:'URL failure: Connection timed out' [Mon Mar 03 16:30:38 CET 2014] [Mon Mar 03 16:30:38 CET 2014] [Mon Mar 03 16:30:38 CET 2014]
Luckily, the ovm server has tcpdump installed, so I fired it up and checked which port it was trying to reach. My guess was that the SC Plugin would communicate over ssh but it looks like I assumed wrongly:
Warning: making manual modifications in the management domain might cause inconsistencies between Oracle VM Manager and the server. [root@ovm42 ~]# tcpdump -i bond0 host 192.168.142.74 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes 16:27:28.742474 IP 192.168.42.42.56542 > 192.168.142.74.215: S 1866677898:1866677898(0) win 14600
16:27:31.746946 IP 192.168.42.42.56542 > 192.168.142.74.215: S 1866677898:1866677898(0) win 14600 16:27:37.763049 IP 192.168.42.42.56542 > 192.168.142.74.215: S 1866677898:1866677898(0) win 14600 16:27:49.778992 IP 192.168.42.42.56542 > 192.168.142.74.215: S 1866677898:1866677898(0) win 14600
So take this as a quick tip: When configuring Oracle VM Manager with Storage Connect to ZFS Appliances, make sure to open tcp port 215 from your OVM Servers (not the management Server btw) to your Appliance.
What’s the most elaborate thing you have done with DataPump?
So there I was, given the requirement to export multiple partitions for multiple tables where each partition has its own dump file having the format “tablename_partitionanme.dmp”, pondering how this can be done efficiently.
With the following metadata and requirements, what approach would you take?
If you are curious about the I approach I used, then read on.
TABLE_OWNER TABLE_NAME PARTITION_NAME ------------------------------ ------------------------------ ------------------------------ MDINH A_TAB P001 MDINH A_TAB P002 MDINH A_TAB P003 MDINH A_TAB P004 MDINH A_TAB P005 MDINH B_TAB P001 MDINH B_TAB P002 MDINH B_TAB P003 MDINH B_TAB P004 MDINH B_TAB P005
Here’s the demo:
$ nohup sqlplus "/ as sysdba" @exp_api.sql > exp_api.log 2>&1 & $ cat exp_api.log nohup: ignoring input SQL*Plus: Release 184.108.40.206.0 Production on Wed Feb 26 20:28:07 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ARROW:(SYS@db01):PRIMARY> -- DataPump Export (EXPDP) Fails With Errors ORA-39001 ORA-39000 ORA-31641 ORA-27054 ORA-27037 When The Dump File Is On NFS Mount Point (Doc ID 1518979.1) ARROW:(SYS@db01):PRIMARY> -- Work around for the above mentioned error ARROW:(SYS@db01):PRIMARY> alter system set events '10298 trace name context forever, level 32'; System altered. Elapsed: 00:00:00.00 ARROW:(SYS@db01):PRIMARY> declare 2 h1 number; 3 dir_name varchar2(30); 4 begin 5 dir_name := 'DPDIR'; 6 for x in ( 7 select table_owner, table_name, partition_name 8 from dba_tab_partitions 9 where table_owner = 'MDINH' and table_name in ('A_TAB','B_TAB') and regexp_like(partition_name,'[0-4]$') 10 order by table_owner, table_name, partition_position 11 ) loop 12 13 h1 := dbms_datapump.open (operation => 'EXPORT', job_mode => 'TABLE'); 14 15 dbms_datapump.add_file ( 16 handle => h1, 17 filename => x.table_name||'_'||x.partition_name||'.dmp', 18 reusefile => 1, 19 directory => dir_name, 20 filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE); 21 22 dbms_datapump.add_file ( 23 handle => h1, 24 filename => 'exp_'||x.table_name||'_'||x.partition_name||'.log', 25 directory => dir_name, 26 filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE); 27 28 dbms_datapump.set_parameter ( 29 handle => h1, 30 name => 'INCLUDE_METADATA', 31 value => 0); 32 33 dbms_datapump.metadata_filter ( 34 handle => h1, 35 name => 'SCHEMA_EXPR', 36 value => 'IN ('''||x.table_owner||''')'); 37 38 dbms_datapump.metadata_filter ( 39 handle => h1, 40 name => 'NAME_EXPR', 41 value => 'IN ('''||x.table_name||''')'); 42 43 dbms_datapump.data_filter ( 44 handle => h1, 45 name => 'PARTITION_LIST', 46 value => x.partition_name, 47 table_name => x.table_name, 48 schema_name => x.table_owner); 49 50 dbms_datapump.start_job (handle => h1); 51 dbms_datapump.detach (handle => h1); 52 end loop; 53 end; 54 / PL/SQL procedure successfully completed. Elapsed: 00:00:10.92 ARROW:(SYS@db01):PRIMARY> alter system set events '10298 trace name context off'; System altered. Elapsed: 00:00:00.00 ARROW:(SYS@db01):PRIMARY> exit Disconnected from Oracle Database 11g Enterprise Edition Release 18.104.22.168.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
Review export log:
$ ls -l exp*.log-rw-r--r--. 1 oracle oinstall 2888 Feb 26 20:28 exp_api.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_A_TAB_P001.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_A_TAB_P002.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_A_TAB_P003.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_A_TAB_P004.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_B_TAB_P001.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_B_TAB_P002.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_B_TAB_P003.log -rw-r--r--. 1 oracle oinstall 578 Feb 26 20:28 exp_B_TAB_P004.log
Review export dump:
$ ls -l *.dmp -rw-r-----. 1 oracle oinstall 90112 Feb 26 20:28 A_TAB_P001.dmp -rw-r-----. 1 oracle oinstall 98304 Feb 26 20:28 A_TAB_P002.dmp -rw-r-----. 1 oracle oinstall 188416 Feb 26 20:28 A_TAB_P003.dmp -rw-r-----. 1 oracle oinstall 1069056 Feb 26 20:28 A_TAB_P004.dmp -rw-r-----. 1 oracle oinstall 90112 Feb 26 20:28 B_TAB_P001.dmp -rw-r-----. 1 oracle oinstall 98304 Feb 26 20:28 B_TAB_P002.dmp -rw-r-----. 1 oracle oinstall 188416 Feb 26 20:28 B_TAB_P003.dmp -rw-r-----. 1 oracle oinstall 1069056 Feb 26 20:28 B_TAB_P004.dmp
Review job status:
$ grep "successfully completed" exp*.log exp_api.log:PL/SQL procedure successfully completed. exp_A_TAB_P001.log:Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Wed Feb 26 20:28:09 2014 elapsed 0 00:00:01 exp_A_TAB_P002.log:Job "SYS"."SYS_EXPORT_TABLE_03" successfully completed at Wed Feb 26 20:28:10 2014 elapsed 0 00:00:02 exp_A_TAB_P003.log:Job "SYS"."SYS_EXPORT_TABLE_04" successfully completed at Wed Feb 26 20:28:11 2014 elapsed 0 00:00:02 exp_A_TAB_P004.log:Job "SYS"."SYS_EXPORT_TABLE_05" successfully completed at Wed Feb 26 20:28:13 2014 elapsed 0 00:00:02 exp_B_TAB_P001.log:Job "SYS"."SYS_EXPORT_TABLE_06" successfully completed at Wed Feb 26 20:28:14 2014 elapsed 0 00:00:02 exp_B_TAB_P002.log:Job "SYS"."SYS_EXPORT_TABLE_07" successfully completed at Wed Feb 26 20:28:16 2014 elapsed 0 00:00:02 exp_B_TAB_P003.log:Job "SYS"."SYS_EXPORT_TABLE_08" successfully completed at Wed Feb 26 20:28:17 2014 elapsed 0 00:00:03 exp_B_TAB_P004.log:Job "SYS"."SYS_EXPORT_TABLE_09" successfully completed at Wed Feb 26 20:28:19 2014 elapsed 0 00:00:02
Review exported partition:
$ grep "exported" exp*.log exp_A_TAB_P001.log:. . exported "MDINH"."A_TAB":"P001" 6.351 KB 9 rows exp_A_TAB_P002.log:. . exported "MDINH"."A_TAB":"P002" 14.89 KB 90 rows exp_A_TAB_P003.log:. . exported "MDINH"."A_TAB":"P003" 101.1 KB 900 rows exp_A_TAB_P004.log:. . exported "MDINH"."A_TAB":"P004" 963.3 KB 9000 rows exp_B_TAB_P001.log:. . exported "MDINH"."B_TAB":"P001" 6.351 KB 9 rows exp_B_TAB_P002.log:. . exported "MDINH"."B_TAB":"P002" 14.89 KB 90 rows exp_B_TAB_P003.log:. . exported "MDINH"."B_TAB":"P003" 101.1 KB 900 rows exp_B_TAB_P004.log:. . exported "MDINH"."B_TAB":"P004" 963.3 KB 9000 rows
Example of completed log:
$ cat exp_B_TAB_P001.log Starting "SYS"."SYS_EXPORT_TABLE_06": Estimate in progress using BLOCKS method... Processing object type TABLE_EXPORT/TABLE/TABLE_DATA Total estimation using BLOCKS method: 8 MB . . exported "MDINH"."B_TAB":"P001" 6.351 KB 9 rows Master table "SYS"."SYS_EXPORT_TABLE_06" successfully loaded/unloaded ****************************************************************************** Dump file set for SYS.SYS_EXPORT_TABLE_06 is: /tmp/B_TAB_P001.dmp Job "SYS"."SYS_EXPORT_TABLE_06" successfully completed at Wed Feb 26 20:28:14 2014 elapsed 0 00:00:02
set timing on echo on -- DataPump Export (EXPDP) Fails With Errors ORA-39001 ORA-39000 ORA-31641 ORA-27054 ORA-27037 When The Dump File Is On NFS Mount Point (Doc ID 1518979.1) -- Work around for the above mentioned error alter system set events '10298 trace name context forever, level 32'; declare h1 number; dir_name varchar2(30); begin dir_name := 'DPDIR'; for x in ( select table_owner, table_name, partition_name from dba_tab_partitions where table_owner = 'MDINH' and table_name in ('A_TAB','B_TAB') and regexp_like(partition_name,'[0-4]$') order by table_owner, table_name, partition_position ) loop h1 := dbms_datapump.open (operation => 'EXPORT', job_mode => 'TABLE'); dbms_datapump.add_file ( handle => h1, filename => x.table_name||'_'||x.partition_name||'.dmp', reusefile => 1, -- REUSE_DUMPFILES=Y directory => dir_name, filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE); dbms_datapump.add_file ( handle => h1, filename => 'exp_'||x.table_name||'_'||x.partition_name||'.log', directory => dir_name, filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE); -- CONTENT = DATA_ONLY dbms_datapump.set_parameter ( handle => h1, name => 'INCLUDE_METADATA', value => 0); dbms_datapump.metadata_filter ( handle => h1, name => 'SCHEMA_EXPR', value => 'IN ('''||x.table_owner||''')'); dbms_datapump.metadata_filter ( handle => h1, name => 'NAME_EXPR', value => 'IN ('''||x.table_name||''')'); dbms_datapump.data_filter ( handle => h1, name => 'PARTITION_LIST', value => x.partition_name, table_name => x.table_name, schema_name => x.table_owner); dbms_datapump.start_job (handle => h1); dbms_datapump.detach (handle => h1); end loop; end; / alter system set events '10298 trace name context off'; exit
Are you curious about TempDB? Allow me to help make your understanding of the subject permanent. TempDB is an interesting system database that doesn’t behave quite like the others. It’s kind of like a quirky friend that you’ve come to rely on when you need a hand with getting something done.
Who is TempDB for?
TempDB is accessible to all users that are connected to an instance of SQL Server.
What is TempDB?
TempDB is a system database that’s used to store temporary objects. It utilizes minimal logging, meaning it only stores the information required to recover a transaction – this means no point-in-time recovery. You can’t perform a backup or restore on TempDB, and each time SQL Server is restarted, TempDB gets re-created with the last configured settings.
How does TempDB get used?
TempDB stores a number of different temporary objects:
- User-created temporary objects: Such as global (prefixed by ##) or local (prefixed by #) temporary tables, temporary stored procedures and cursors.
- Internal temporary objects: Like work tables for intermediate results (any sorting, such as GROUP BY or ORDER BY)
- Version stores: Row versions for data-modification transactions when certain isolation levels are in use.
- …And certain features like online index operations
Think about it like this: I’m 5’7” and 150 pounds. Would you rather have me or Dwayne “The Rock” Johnson helping you move your furniture? If it’s just a chair or two, I’m happy to help, but if you want me to move a sofa and don’t plan on doing any heavy-lifting yourself, you’ll probably want to get a bigger guy.
Just like when you’re moving furniture, when using TempDB, the size configurations are very important.
TempDB typically has three common issues that a DBA can run into: I/O bottlenecks, low disk space, and page contention. These issues and their resolutions are often interrelated.
If you really want my help, and I see that I’m not going to be big enough to move your sofa, I’ll try my hardest to bulk up so that I can perform the task. TempDB feels the same way, and will automatically grow in size (the default autogrowth setting is by 10% of its current size) in order to accomplish whatever job it set out to do. It’s great that TempDB is so willing to help, but in some cases, it can become so concerned with getting bigger that it causes I/O pressure on the disk, and performance can suffer. In even worse scenarios, TempDB might decide that it needs to be so huge that it consumes the whole disk. How am I going to help you move your furniture if my muscles can’t even fit through the doorframe anymore?!
Since TempDB is so handy, everyone wants a piece of it, and TempDB will put a lot of work on its plate, trying to satisfy everyone’s demands. This can cause page contention, which we witness in the form of the PAGELATCH wait type (Note: This is not PAGEIOLATCH) denoting that a page is protected and already in memory. Queries needing TempDB’s help will have to wait for their turn.
Solving Common Issues
So, how do we keep TempDB from feeling over-worked and checking into the Smack-Down Hotel? The answer lies in capacity planning.
The basic concepts of capacity planning for TempDB include:
- Set autogrowth to a pre-set amount: If TempDB gets to be the size of The Rock we don’t want it to automatically grow by a percentage of its current size each time. Set the number to a reasonable size to avoid the continuous need for growth, but keep it low enough to avoid wasted space.
- Capture and replay activity using a trace or run individual queries: Monitor TempDB’s growth as it is used.
- The tricky part: Estimate the work load (while accounting for concurrent user-activity)
You can use SYS.DM_DB_SESSION_SPACE_USAGE and SYS.DM_DB_TASK_SPACE_USAGE to identify queries which are consuming TempDB
- Configure the appropriate number of TempDB DATA files: If you thought having one Dwayne “The Rock” Johnson helping you move furniture was cool, imagine having two of him! Increasing the number of TempDB data files will allocate work in a round-robin form. This will relieve some of the pressure on TempDB. It’s like having one Dwayne “The Rock” Johnson Feng Shui your living room while another does the same to your kitchen. It is best practice to set the number of TempDB data files equal to the number of logical CPU cores. For example, if you have a dual-core processor, then set the number of TempDB data files equal to two. If you have more than 8 cores, start with 8 files and add four at a time as needed.
- All TempDB files are created equal: This isn’t George Orwell’s Animal Farm. Ensure that the size and growth settings for all TempDB data files are configured in the same manner.
- Disk Placement: If possible, spread TempDB data files across different disks to reduce I/O contention. Putting TempDB data files on a different disk than the user data files will further reduce I/O contention. Be sure to use fast disks whenever possible.
Keep in mind, that TempDB configurations are largely environment-specific. You might not actually need to have the same number of TempDB files as your logical CPU cores. In fact, having too many TempDB data files can cause performance problems due to slowing down the round-robin style allocation. Use your judgement and knowledge of your environment to determine what’s right for your system. If it’s configured properly, then you and your TempDB can be tag-team wrestling (or moving furniture) with queries like heavyweight champions.
For some more information, check out the following links:
In the past I have enthused mightily about the benefits of the approximate NDV mechanism and the benefit of using auto_sample_size to collect statistics in 11g; however, as so often happens with Oracle features, there’s a down-side or boundary condition, or edge case. I’ve already picked this up once as an addendum to an earlier blog note on virtual stats, which linked to an article on OTN describing how the time taken to collect stats on a table increased dramatically after the addition of an index – where the index had this definition:
create bitmap index i_s_rmp_eval_csc_msg_actions on s_rmp_evaluation_csc_message ( decode(instr(xml_message_text,' '),0,0,1) ) ;
As you might guess from the column name, this is an index based on an XML column, which is stored as a CLOB.
In a similar vein, I showed you a few days ago an old example I had of indexing a CLOB column with a call to dbms_lob.getlength(). Both index examples suffer from the same problem – to support the index Oracle creates a hidden (virtual) column on the table that can be used to hold statistics about the values of the function; actual calculated values for the function call are stored in the index but not on the table itself – but it’s important that the optimizer has the statistics about the non-existent column values.
So what happens when Oracle collects table statistics – if you’ve enable the approximate NDV feature Oracle does a 100% sample, which means it has to call the function for every single row in the table. You will appreciate that the decode(instr()) function on the LOB column is going to read every single LOB in turn from the table – it’s not surprising that the time taken to calculate stats on the table jumped from a few minutes to a couple of hours. What did surprise me was that my call to dbms_lob.getlength() also seemed to read every lob in my example rather than reading the “LOB Locator” data that’s stored in the row – one day I’ll take a look into why that happened.
Looking at these examples it’s probably safe to conclude that if you really need to index some very small piece of “flag” information from a LOB it’s probably best to store it as a real column on the table – perhaps populated through a trigger so you don’t have to trust every single piece of front-end code to keep it up to date. (It would be quite nice if Oracle gave us the option for a “derived” column – i.e. one that could be defined in the same sort of way as a virtual column, with the difference that it should be stored in the table.)
So virtual columns based on LOBs can create a performance problem for the approximate NDV mechanism; but the story doesn’t stop there because there’s another “less commonly used” feature of Oracle that introduces a different threat – with no workaround – it’s the index organized table (IOT). Here’s a basic example:
create table iot1 ( id1 number(7,0), id2 number(7,0), v1 varchar2(10), v2 varchar2(10), padding varchar2(500), constraint iot1_pk primary key(id1, id2) ) organization index including id2 overflow ; insert into iot1 with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select mod(rownum,20) id1, trunc(rownum,100) id2, to_char(mod(rownum,20)) v1, to_char(trunc(rownum,100)) v2, rpad('x',500,'x') padding from generator v1, generator v2 where rownum <= 1e5 ; commit; alter system flush buffer_cache; alter session set events '10046 trace name context forever'; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'IOT1', method_opt => 'for all columns size 1' ); end; / alter session set events '10046 trace name context off';
You’ll notice I’ve created the table then inserted the data – if I did a “create table as select” Oracle would have sorted the data before inserting it, and that would have helped to hide the problem I’m trying to demonstrate. As it is my overflow segment is very badly ordered relative to the “top” (i.e. index) segment – in fact I can see after I’ve collected stats on the table that the clustering_factor on the index is 100,000 – an exact match for the rows in the table.
Running 22.214.171.124, with a 1MB uniform extent, freelist management, and 8KB block size the index segment held 279 leaf blocks, the overflow segment (reported in view user_tables as SYS_IOT_OVER_81594) held 7,144 data blocks.
So what interesting things do we find in a 10046 trace file after gathering stats – here are the key details from the tkprof results:
SQL ID: 7ak95sy9m1s4f Plan Hash: 1508788224 select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */to_char(count("ID1")),to_char(substrb(dump(min("ID1"),16, 0,32),1,120)),to_char(substrb(dump(max("ID1"),16,0,32),1,120)), to_char(count("ID2")),to_char(substrb(dump(min("ID2"),16,0,32),1,120)), to_char(substrb(dump(max("ID2"),16,0,32),1,120)),to_char(count("V1")), to_char(substrb(dump(min("V1"),16,0,32),1,120)), to_char(substrb(dump(max("V1"),16,0,32),1,120)),to_char(count("V2")), to_char(substrb(dump(min("V2"),16,0,32),1,120)), to_char(substrb(dump(max("V2"),16,0,32),1,120)),to_char(count("PADDING")), to_char(substrb(dump(min("PADDING"),16,0,32),1,120)), to_char(substrb(dump(max("PADDING"),16,0,32),1,120)) from "TEST_USER"."IOT1" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV, NIL,NIL*/ call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.37 0.37 7423 107705 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.37 0.37 7423 107705 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 62 (recursive depth: 1) Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 SORT AGGREGATE (cr=107705 pr=7423 pw=0 time=377008 us) 100000 100000 100000 APPROXIMATE NDV AGGREGATE (cr=107705 pr=7423 pw=0 time=426437 us cost=10 size=23944 card=82) 100000 100000 100000 INDEX FAST FULL SCAN IOT1_PK (cr=107705 pr=7423 pw=0 time=298380 us cost=10 size=23944 card=82)(object id 85913) ******************************************************************************** SQL ID: 1ca2ug8s3mm5z Plan Hash: 2571749554 select /*+ no_parallel_index(t, "IOT1_PK") dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad no_expand index(t,"IOT1_PK") */ count(*) as nrw,count(distinct sys_op_lbid(85913,'L',t.rowid)) as nlb,null as ndk, sys_op_countchg(sys_op_lbid(85913,'O',"V1"),1) as clf from "TEST_USER"."IOT1" t where "ID1" is not null or "ID2" is not null call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.16 0.16 0 100280 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.16 0.16 0 100280 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 62 (recursive depth: 1) Number of plan statistics captured: 1 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 SORT GROUP BY (cr=100280 pr=0 pw=0 time=162739 us) 100000 100000 100000 INDEX FULL SCAN IOT1_PK (cr=100280 pr=0 pw=0 time=164597 us cost=6 size=5900000 card=100000)(object id 85913)
The first query collects table and column stats, and we can see that the approximate NDV method has been used because of the trailing text: /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/. In this statement the hint /*+ full(t) */ has been interpreted to mean an index fast full scan, which is what we see in the execution plan. Although there are only 279 blocks in the index and 7,144 blocks in the overflow we’ve done a little over 100,000 buffer visits because for every index entry in the IOT top we’ve done a “fetch by rowid” into the overflow segment (the session stats records these as “table fetch continued row”). Luckily I had a small table so all those visits were buffer gets; on a very large table it’s quite possible that a significant fraction of those buffer gets will turn into single block physical reads.
Not only have we done one buffer visit per row to allow us to calculate the approximate NDV for the table columns, we’ve done the same all over again so that we can calculate the clustering_factor of the index. This is a little surprising since the “rowid” for an item in the overflow section is stored in the index segment but (as you can see in the second query in the tkprof output) Oracle has used column v1 (the first in the overflow segment) in the call to the sys_op_countchg() function where the equivalent call for an ordinary index would use t.rowid so, presumably, the code HAS to access the overflow segment. The really strange thing about this is that the same SQL statement has a call to sys_op_lbid() which uses the (not supposed to exist in IOTs) rowid – so it looks as if it ought to be possible for sys_op_countchg() to do the same.
So – big warning on upgrading to 11g: if you’ve got IOTs with overflows and you switch to auto_sample_size and enable approximate NDV then the time taken to gather stats on those IOTs may (depending to a large extent on the data clustering) take much longer than it used to.
Yesterday’s flight from Bangalore to Chennai was very short. It felt like less than an hour to me. Hans Forbrich wasn’t coming to this event, so it looked like Debra Lilley and I would be on our own for this journey, but we bumped into Kuassi Mensah at the airport.
The hotel was very nice, but doing in-room check in freaks me out. I know the hotels think it is something special, but I would rather check in at the front desk and go to my room on my own. There is something about having other people in my room that reminds me I am only one in a very long procession of people to use that space. Gives me the creeps…
Today’s Chennai event was held at the Loyola-ICAM College of Engineering and Technology (LICET), not far from our hotel. The keynotes from the University staff focussed on development of knowledge, rather than focussing just on facts or what skills you need to get a job. One of the slides read,
“Data becomes information when it is organised. Information becomes knowledge when it is placed in actionable context. Without context there is little value.”
I am always concerned when people or institutions focus totally on what is job-worthy as I think it develops blinkered individuals. Applied research can only be a part of the whole picture, not the total focus. It seems LICET has the same point of view as me. At the end of the keynote I felt really inspired and quite excited about the day ahead.
As part of the engineering course, the LICET students have access to a stripped back car, which has all the internal systems on display, allowing the students to see both the individual systems and how they fit into the big picture. I was asked if I wanted to drive this car, so I can now say I’ve driven in India.
I had a few minor issues with my laptop, but despite that my sessions got very positive feedback. As always, I spent a lot of time answering questions between my sessions and at the end of my last session as well.
Thanks to everyone at LICET and all the attendees for coming to the event. It really made for a great last event of the tour!
I fly home early tomorrow morning, so that I can get back in time for work on Monday. I’ll write a wrap-up post when I get home.
OTN Yathra 2014 : Chennai was first posted on March 1, 2014 at 6:41 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.
Good one. I like the presumption that I know a lot of stuff about Oracle. I suppose that I do, at least about some some aspects of it, although I often feel like I don’t know enough. It occurred to me that answering publicly might also be helpful to anyone trying to figure out how to prepare for a career. Here’s my answer.
I took a job with the young consulting division of Oracle Corporation in September 1989, about two weeks after the very first time I had heard the word “Oracle” used as the name of a company. My background had been mathematics and computer science in school. I had two post-graduate degrees: a Master of Science Computer Science with a focus on language design and compilers, and a Master of Business Administration with a focus in finance.
My first “career job” was as a software engineer, which I started before the MBA. I designed languages and wrote compilers to implement those languages. Yes, people actually pay good money for that, and it’s possibly still the most fun I’ve ever had at work. I wrote software in C, lex, and yacc, and I taught my colleagues how to do it, too. In particular, I spent a lot of time teaching my colleagues how to make their C code faster and more portable (so it would run on more computers than just one on which you wrote it).
Even though I loved my job, I didn’t see a lot of future in it. At least not in Colorado Springs in the late 1980s. So I took a year off to get the MBA at SMU in Dallas. I went for the MBA because I thought I needed to learn more about money and business. It was the most difficult academic year of my life, because I was not particularly connected to or even interested in most of the subject matter. I hated a lot of my classes, which made it difficult to do as well as I had been accustomed. But I kept grinding away, and finished my degree in the year it was supposed to take. Of course I learned many, many things that year that have been vital to my career.
A couple of weeks after I got my MBA, I went to work for Oracle in Dallas, with a salary that was 168% of what it had been as a compiler designer. My job was to visit Oracle customers and help them with their problems.
It took a while for me to get into a good rhythm at Oracle. My boss was sending me to these local customers that were having problems with the Oracle Financial Applications (the “Finapps,” as we usually called them, which would many years later become the E-Business Suite) on version 6.0.26 of the ORACLE database (it was all caps back then). At first, I couldn’t help them near as much as I had wanted to. It was frustrating.
That actually became my rhythm: week after week, I visited these people who were having horrific problems with ORACLE and the Finapps. The database in 1990, although it had some pretty big bugs, was still pretty good. It was the applications that caused most of the problems I saw. There were a lot of problems, both with the software and with how it was sold. My job was to fix the problems. Some of those problems were technical. Many were not.
A lot of the problems were performance; problems of the software running “too slowly.” I found those problems particularly interesting. For those, I had some experience and tools at my disposal. I knew a good bit about operating systems and compilers and profilers and linkers and debuggers and all that, and so learning about Oracle indexes and rollback segments (two good examples, continual sources of customer frustration) wasn’t that scary of a step for me.
I hadn’t learned anything about Oracle or relational databases in school, I learned about how the database worked at Oracle by reading the documentation, beginning with the excellent Oracle® Database Concepts. Oracle sped me along a bit with a couple of the standard DBA courses.
My real learning came from being in the field. The problems my customers had were immediately interesting by virtue of being important. The resources available to me for solving such problems back in the early 1990s were really just books, email, and the telephone. The Internet didn’t exist yet. (Can you imagine?) The Oracle books available back then, for the most part, were absolutely horrible. Just garbage. Just about the only thing they were good for was creating problems that you could bill lots of consulting hours to fix. The only thing that was left was email and the telephone.
The problem with email and telephones, however, is that there has to be someone on the other end. Fortunately, I had that. The people on the other end of my email and phone calls were my saviors and heroes. In my early Oracle years, those saviors and heroes included people like Darryl Presley, Laurel Jamtgaard, Tom Kemp, Charlene Feldkamp, David Ensor, Willis Ranney, Lawrence To, Roderick Mañalac, Greg Doherty, Juan Loaiza, Bill Bridge, Brom Mahbod, Alex Ho, Jonathan Klein, Graham Wood, Mark Farnham (who didn’t even work for Oracle, but who could cheerfully introduce me to anyone I needed), Anjo Kolk, and Mogens Nørgaard. I could never repay these people, and many more, for what they did for me. ...In some cases, at all hours of the night.
So, how did I learn so much stuff about Oracle? It started by immersing myself into a universe where every working day I had to solve somebody’s real Oracle problems. Uncomfortable, but effective. I survived because I was persistent and because I had a great company behind me, filled with spectacularly intelligent people who loved helping each other. Could I have done that on my own, today, with the advent of the Internet and lots and lots of great and reliable books out there to draw upon? I doubt it. I sincerely do. But maybe if I were young again...
I tell my children, there’s only one place where money comes from: other people. Money comes only from other people. So many things in life are that way.
I’m a natural introvert. I naturally withdraw from group interactions whenever I don’t feel like I’m helping other people. Thankfully, my work and my family draw me out into the world. If you put me into a situation where I need to solve a technical problem that I can’t solve by myself, then I’ll seek help from the wonderful friends I’ve made.
I can never pay it back, but I can try to pay it forward.
(Oddly, as I’m writing this, I realize that I don’t take the same healthy approach to solving business problems. Perhaps it’s because I naturally assume that my friends would have fun helping solve a technical problem, but that solving a business problem would not be fun and therefore I would be imposing upon them if I were to ask for help solving one. I need to work on that.)
So, to my new LinkedIn friend, here’s my advice. Here’s what worked for me:
- Educate yourself. Read, study, experiment. Educate yourself especially well in the fundamentals. So many people don’t. Being fantastic at the fundamentals is a competitive advantage, no matter what you do. If it’s Oracle you’re interested in learning about, that’s software, so learn about software: about operating systems, and C, and linkers, and profilers, and debuggers, .... Read the Oracle Database Concepts guide and all the other free Oracle documentation. Read every book there is by Tom Kyte and Christian Antognini and Jonathan Lewis and Tanel Põder and Kerry Osborne and Karen Morton and James Morle all the other great authors out there today. And read their blogs.
- Find a way to hook yourself into a network of people that are willing and able to help you. You can do that online these days. You can earn your way into a community by doing things like asking thoughtful questions, treating people respectfully (even the ones who don’t treat you respectfully), and finding ways to teach others what you’ve learned. Write. Write what you know, for other people to use and improve. And for God’s sake, if you don’t know something, don’t act like you do. That just makes everyone think you’re an asshole, which isn’t helpful.
- Immerse yourself into some real problems. Read Scuttle Your Ships Before Advancing if you don’t understand why. You can solve real problems online these days, too (e.g., StackExchange and even Oracle.com), although I think that it’s better to work on real live problems at real live customer sites. Stick with it. Fix things. Help people.
That’s my advice.
A little while ago I highlighted a special case with the MINUS operator (that one of the commentators extended to include the INTERSECT operator) relating to the way the second subquery would take place even if the first subquery produced no rows. I’ve since had an email from an Oracle employee letting me know that the developers looked at this case and decided that it wasn’t feasible to address it because – taking a wider view point – if the query were to run parallel they would need a mechanism that allowed some synchronisation between slaves so that every slave could find out that none of the slaves had received no rows from the first subquery, and this was going to lead to hanging problems.
The email reminded me that there’s another issue of the same kind that I discovered several years ago – I thought I’d written it up, but maybe it was on a newsgroup or forum somewhere, I can’t find it on my blog or old website). The problem can be demonstrated by this example:
create table t1 as select rownum id, mod(rownum,25) rep_col, rpad('x',50) padding from all_objects where rownum <= 3000 ; delete from t1 where rep_col = 12; create table t2 as select rownum id, mod(rownum,25) rep_col, rpad('x',50) padding from all_objects where rownum <= 10000 ; -- collect stats, no histograms select /*+ leading(t1 t2) use_hash(t2) no_swap_join_inputs(t2) pq_distribute(t2, hash, hash) */ count(*) from t1,t2 where t1.rep_col = 12 and t2.id = t1.id ;
You’ll notice that I’ve created a data set for table t1 where the values 12 does not appear in column rep_col; so my query will return no rows. However, for the purposes of demonstration, I’ve hinted an execution path that will scan t1 and distribute it by hash, then scan t2 to distribute that by hash before doing the join. Here’s the plan – which I’ve generated with rowsource execution statistics enabled and pulled from memory after executing the query:
------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1 |00:00:00.01 | 6 | 2 | | 1 | SORT AGGREGATE | | 1 | 1 |00:00:00.01 | 6 | 2 | | 2 | PX COORDINATOR | | 1 | 2 |00:00:00.01 | 6 | 2 | | 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 0 |00:00:00.01 | 0 | 0 | | 4 | SORT AGGREGATE | | 2 | 2 |00:00:00.01 | 0 | 0 | |* 5 | HASH JOIN | | 2 | 0 |00:00:00.01 | 0 | 0 | | 6 | JOIN FILTER CREATE | :BF0000 | 2 | 0 |00:00:00.01 | 0 | 0 | | 7 | PX RECEIVE | | 2 | 0 |00:00:00.01 | 0 | 0 | | 8 | PX SEND HASH | :TQ10000 | 0 | 0 |00:00:00.01 | 0 | 0 | | 9 | PX BLOCK ITERATOR | | 2 | 0 |00:00:00.01 | 54 | 27 | |* 10 | TABLE ACCESS FULL| T1 | 27 | 0 |00:00:00.01 | 54 | 27 | | 11 | PX RECEIVE | | 0 | 0 |00:00:00.01 | 0 | 0 | | 12 | PX SEND HASH | :TQ10001 | 0 | 0 |00:00:00.01 | 0 | 0 | | 13 | JOIN FILTER USE | :BF0000 | 2 | 2 |00:00:00.01 | 116 | 87 | | 14 | PX BLOCK ITERATOR | | 2 | 2 |00:00:00.01 | 116 | 87 | |* 15 | TABLE ACCESS FULL| T2 | 29 | 2 |00:00:00.01 | 116 | 87 | -------------------------------------------------------------------------------------------------
Since this was 126.96.36.199 Oracle has used Bloom filtering to reduce the traffic between slave sets – but you can see that despite returning no rows from t1 (lines 7 – 10), Oracle still performed the parallel tablescan of t2 (lines 12 – 15). Thanks to the Bloom filter we don’t transfer 10,000 rows between slave sets, but we can see from the Buffers and Reads columns that we really did do the tablescan – and if we take a snapshot of instance activity we would have seen 10,000 rows fetched by tablescan at that point.
If I ran this query serially Oracle would stop after discovering that the first tablescan returned no rows – why bother scanning the probe table when the hash table is empty ? But because the query is running in parallel, a single slave that receives no data from the first tablescan cannot assume that every other slave in the same set has also received no data – there’s no cross-chat that allows the slaves to discover that every slave has no data – so the second scan goes ahead.
I was a little surprised by this when I first found it since I thought (from looking at some of the 1039x trace information) that the parallel slaves were telling the query coordinator how many rows they had acquired on each granule – which would allow the coordinator to spot the zero total. But it looks as if I was misinterpreting the trace.
On the plus side for this example – it’s probably the case that if zero is a possible volume of data returned by the query then the optimizer will have decided that it was going to get a “small” data set for the build table and therefore do a broadcast distribution – and if that happens the second tablescan won’t occur – as we see below (note the zero Reads, zero Buffers):
------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1 |00:00:00.01 | 6 | 2 | | 1 | SORT AGGREGATE | | 1 | 1 |00:00:00.01 | 6 | 2 | | 2 | PX COORDINATOR | | 1 | 2 |00:00:00.01 | 6 | 2 | | 3 | PX SEND QC (RANDOM) | :TQ10001 | 0 | 0 |00:00:00.01 | 0 | 0 | | 4 | SORT AGGREGATE | | 2 | 2 |00:00:00.01 | 0 | 0 | |* 5 | HASH JOIN | | 2 | 0 |00:00:00.01 | 0 | 0 | | 6 | PX RECEIVE | | 2 | 0 |00:00:00.01 | 0 | 0 | | 7 | PX SEND BROADCAST | :TQ10000 | 0 | 0 |00:00:00.01 | 0 | 0 | | 8 | PX BLOCK ITERATOR | | 2 | 0 |00:00:00.01 | 54 | 27 | |* 9 | TABLE ACCESS FULL| T1 | 27 | 0 |00:00:00.01 | 54 | 27 | | 10 | PX BLOCK ITERATOR | | 0 | 0 |00:00:00.01 | 0 | 0 | |* 11 | TABLE ACCESS FULL | T2 | 0 | 0 |00:00:00.01 | 0 | 0 | ------------------------------------------------------------------------------------------------
Unfortunately the same type of argument can’t be used to dismiss the minus/intersect cases.