Daily Archives Monday, September 2008

New Ubiquitous Ways to Search Oracle Documentation and More

If you use Firefox, you’re in for a treat. If not, please download it and resume reading about Ubiquity and how you can use it to query the Oracle database documentation library and do other neat stuff using simple commands.

Jake was struggling to define it, but basically, Ubiquity is a Firefox extension, a powerful one. [...]

Time Matters - DB CPU

In his comment on the last post, John Beresniewicz mentioned the following property of DB CPU :-

"… ‘DB time’
fully contains ‘DB CPU’ and both are measured values (which means
in particular that ‘DB CPU’ is not inflated by runq time)
"

I’d like to talk about the end of that sentence in particular. About a year and a half ago, I wrote a paper (that’s a PDF link) about building response time profiles for sessions that use Parallel Execution. One of the problems I highlighted in that paper was that PX jobs are prone to overload the available CPU resources simply because of the number of PX slave processes that are running concurrently. It’s mentioned first in the section on "More Lost Time" on page 4 but most of the rest of the paper goes on to discuss this particular problem. I wasn’t very successful coming up with any solutions beyond my usual approach of combining Oracle and O/S tools.

Essentially, as the queue of runnable processes that are waiting to execute on a CPU grows (I’ll just refer to it as the run queue from now on), the Oracle server connection processes appear to lose time because they’re unable to get on to a CPU in order to record the end time of an event. In the Response Time profilers I’ve seen, this time is recorded as unaccounted-for and it’s what JB’s referring to when he talks about runq time. Actually, time spent in the run queue is not the only cause of unaccounted-for time but it’s the one I’ll focus on here. (Here’s another plug for Cary and Jeff’s book - other reasons are discussed in chapter 7 on Oracle Kernel Timings.)

As JB stated, DB CPU does not include this time spent waiting in the run queue but the ASH samples for the session are aware that the server connection process is on the CPU or in the run queue. As a consequence, you can see apparently contradictory graphs in the various OEM (DB Control or Grid Control) Performance pages. Although I’ve tried this with the PX tests and confirmed the same behaviour, I’m going to use Swingbench for the examples here to continue the approach so far and so that you can try them out yourself if you want. (I might post PX examples later.)

Swingbench is supplied with several example applications, one of which is the Sales History (SH) application which uses the SH example schema supplied with recent versions of Oracle. The key attribute of this test for my purpose is that it’s read only and CPU intensive, so that the limited I/O bandwidth on a home PC isn’t the bottleneck.

I ran 8 concurrent sessions of the SH application for a total duration of 5 minutes with no delay between transactions, which was pretty unrealistic but I was only interested in over-loading the single dual-core CPU in the PC. Soon after the test started, this was how the database instance’s home page looked.

CPU is at 100% and there are about 7 sessions active in the database on average, just under 2 of which are running on the CPU, while the rest are waiting. That looks about right. Now, if we look at the performance page at the end of the test, the results are quite different.

It looks like there were only ever about 2 sessions active on average! However, there are a couple of hints as to what’s going on. If we look at the graph above as well as the Average Active Sessions graph, we can see that the Host’s run queue is far in excess of the two available CPU cores.

Things become even more interesting if we look at the Top Activity screen for the same period.

In that view, the sessions waiting in the run queue are shown, so there’s an average of about 7 active sessions during the test. Take a good look at the difference in the key text for the two graphs as well. The Average Active Sessions graph on the Performance Home page shows ‘CPU Used’, whereas the Top Activity page shows ‘CPU’. It’s a small but significant and appropriate difference. The first graph shows you the average number of sessions that are actually using CPU and is related to DB CPU in the Time Model Stats, but the Top Activity screen is really showing ‘CPU + Wait for CPU’, which is the event you’ll see throughout an ASH report covering the same period. e.g.

Top Sessions

  • ‘# Samples Active’ shows the number of ASH samples in which the session
    was found waiting for that particular event. The percentage shown
    in this column is calculated with respect to wall clock time
    and not total database activity.
  • ‘XIDs’ shows the number of distinct transaction IDs sampled in ASH
    when the session was waiting for that particular event
  • For sessions running Parallel Queries, this section will NOT aggregate
    the PQ slave activity into the session issuing the PQ. Refer to
    the ‘Top Sessions running PQs’ section for such statistics.
Sid, Serial# % Activity Event % Event User Program # Samples Active XIDs
136, 2716 12.75 CPU + Wait for CPU 12.16 SH java.exe 227/300 [ 76%] 0
137, 120 12.64 CPU + Wait for CPU 12.32 SH java.exe 230/300 [ 77%] 0
140, 164 12.59 CPU + Wait for CPU 12.00 SH java.exe 224/300 [ 75%] 0
144, 240 12.53 CPU + Wait for CPU 11.94 SH java.exe 223/300 [ 74%] 0
133, 29 12.37 CPU + Wait for CPU 11.62 SH java.exe 217/300 [ 72%] 0

The more I think about this, the more I think that there’s probably a Rule Of Thumb here (I’m joking, really, but maybe it’s something I’ll look for during further tests). Maybe when DB Time is much bigger than the some of the wait times on various timed events plus CPU time, it’s indicative of CPU overload? Then again, maybe it’s just easier to check the process states and run queue? ;-)

Finally, I couldn’t resist checking what ADDM makes of all this, specially for Graham Wood (that’s another PDF link). Well, in this case, ADDM is smart enough to work out what’s going on.

          DETAILED ADDM REPORT FOR TASK 'TASK_46' WITH ID 46          --------------------------------------------------

              Analysis Period: 28-AUG-2008 from 23:22:54 to 23:27:56         Database ID/Instance: 1533705496/1      Database/Instance Names: TEST1020/TEST1020                    Host Name: DESKTOP             Database Version: 10.2.0.4.0               Snapshot Range: from 26 to 27                Database Time: 2006 seconds        Average Database Load: 6.6 active sessions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FINDING 1: 100% impact (2006 seconds)-------------------------------------Host CPU was a bottleneck and the instance was consuming 98% of the host CPU. All wait times will be inflated by wait for CPU.

   RECOMMENDATION 1: Host Configuration, 100% benefit (2006 seconds)      ACTION: Consider adding more CPUs to the host or adding instances          serving the database on other hosts.      ACTION: Also consider using Oracle Database Resource Manager to          prioritize the workload from various consumer groups.

   ADDITIONAL INFORMATION:      Host CPU consumption was 99%.  CPU runqueue statistics are not available       from the host's OS. This disables ADDM's ability to estimate the impact       of this finding.  The instance spent significant time on CPU. However,       there were no predominant SQL statements responsible for the CPU load.

Oracle hidden costs revealed, Part2 - Using DTrace to find why writes in SYSTEM tablespace are slower than in others

I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.

Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?
Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in [...]

Unemployed…

Friday was my last day at work, so today I’ve officially joined the ranks of the unemployed…
My current plan is to take the rest of the year off and start thinking about getting a job in the new year. I’ve got a lot going on over the next few months, so I don’t think I’ll [...]

Babylon A.D…

I went to see Babylon A.D. at the weekend. On the whole the film wasn’t too bad.
Vin Diesel is exactly what you would expect if you’ve seen any of his other films. Michelle Yeoh is in it, which means it’s worth going to see if only so see her.
The fight sequences lack clarity. It’s pretty [...]

Why Flickr Rules Even if You Don’t Share Photos

I love Flickr. It’s one of the orignial Web 2.0 poster-children for good reason.
Beyond the API and the tagging and the sharing of images, Flickr has become my go-to resource for awesome pictures.
Everyone knows that images make boring, more interesting. You know, that whole picture is worth a 1,000 word aphorism?
When you install a new [...]