Daily Archives Thursday, August 2008

More Connect Mockups

Rich has finished a few more mockups of the changes we’re making to support OpenSocial.
Since our last episode, he’s finished the Profile page. Take a look.

One thing (other than location data, which will return soon, promise) people have been missing since we upgraded Connect to the Mix JRuby code line is status. More people than [...]

Why does even a small difference in SQL text cause a hard parse?

I just replied to an Oracle Forum Thread about why does even a small difference in SQL statement text cause it to be hard parsed and loaded as a different cursor. The reason is actually very simple - and I’m posting it into my blog too:

 
The reason why a statement with even a minor difference [...]

in praise of Disqus

Disqus recently released a update which includes the ability to export comments on a WordPress blog so I have just exported all historic comments left on this blog (when it was running WordPress) into Disqus.This operation was slightly complicated beca…

A presentation trifecta

I’m happy to announce that my presentation “Back to basics: Simple database web services without the need for an application server” has been accepted at 3 conferences, namely:

Oracle Open World (Session S301704) Monday 22nd Moscone West Room 2014

AUSOUG Perth conference 6th/7th October

AUSOUG Gold Coast conference 13th-15th October

I note the OOW Room 2014 holds 300 delegates, but that we’re down to 294 available seats,so the rush is on!! Make sure you reserve your seat now before the two-hundred-and-ninety-four other seats go! ;)

Thanks to Oracle and OOW for running via the Mix the vote-a-session program and all those nice people for voting for me (hi Mum!); I’m appreciative of the chance to present on the official OOW agenda, as well as freak out in such a big room. Make sure you sit at the front so you can see me sweat.

I also be presenting the last leg of the JDeveloper masterclass in the Gold Coast, and helping out with the ADF Methodology at the OOW Unconference, so a busy conference schedule for me.

I’m very much looking forward to seeing and meeting Oracle enthusiasts both sides of the big blue bit. See you there :)

An Infinite Capacity for Waiting* (Part 1)

When I was teaching the 10g Performance class at my last customer site, I remarked that the total number of seconds you see in the "Top 5 Timed Events" section of a Statspack or AWR report will usually be greater than the total number of seconds between the snapshot intervals. For example, this is the "Top 5 Timed Events" section of a Statspack report covering two snapshots that are 5 minutes 30 seconds apart, either side of a 5 minute Sales Order Entry benchmark that I ran using Dominic Giles’ Swingbench utility.

Top 5 Timed Events                                                    Avg %Total~~~~~~~~~~~~~~~~~~                                                   wait   CallEvent                                            Waits    Time (s)   (ms)   Time----------------------------------------- ------------ ----------- ------ ------db file sequential read                          5,814         461     79   38.0log file sync                                    4,246         421     99   34.7log file parallel write                          3,222         109     34    9.0CPU time                                                       108           8.9db file parallel write                           1,718          60     35    4.9

(Note that this is from a 10.2.0.4 Statspack report, but you would see something similar in all AWR reports and Statspack reports that include CPU time in this section.)

Add up those seconds in the Time column and, even just for the top 5 events, the total is 1159 seconds or 19 minutes and 19 seconds. How can the system have spent 19 minutes and 19 seconds using CPU or waiting on various events when there were only 330 seconds in the reporting period?

The person who asked me to clarify this after the training has seen a few Statspack reports in his time and understood that there is more time available in a given period than the Wall Clock period, not least because servers often have multiple CPUs. But, in this case, I ran the test on my laptop which has a dual-core CPU. Therefore the maximum CPU time available between these snapshots was 660 seconds, or 11 minutes.

No, the real reason is that Statspack is reporting system-wide event timings for more than one session and then aggregating them. In this case, there were four user sessions running the SOE benchmark during the period. Given that each user and background session is instrumented and clocking up both service and wait time, aggregating the data for all of the sessions will record more time than has passed if you look at the clock on the wall.

This is such an intuitive idea to me that I hadn’t thought to explain it properly during the course, so I set a mental reminder that I would blog about it soon. Having said that, this has been written about by others on many occasions (*not least by Cary Millsap and Jeff Holt in Optimizing Oracle Performance. In fact, I stole the title of this blog post from their book - see page 215 for more information.) Therefore, if this post makes complete sense to you so far or you’ve read any of those sources before, continued reading will probably be a little boring. If not, this post is especially for you.

So let’s revisit the tests. I ran two 5 minute tests using Swingbench’s supplied Sales Order Entry benchmark, but I could have used anything for this. The important thing is that the first test consisted of one session running the SOE application, while four sessions were running during the second test.

Here is the top 5 timed events section of the Statspack report for the single session test.

Top 5 Timed Events                                                    Avg %Total~~~~~~~~~~~~~~~~~~                                                   wait   CallEvent                                            Waits    Time (s)   (ms)   Time----------------------------------------- ------------ ----------- ------ ------db file sequential read                          4,818         175     36   46.3CPU time                                                        57          15.2db file parallel write                           1,085          42     39   11.2log file sync                                      821          38     46   10.0log file parallel write                          2,702          28     10    7.5

… and from the Statspack report for the four session test.

Top 5 Timed Events                                                    Avg %Total~~~~~~~~~~~~~~~~~~                                                   wait   CallEvent                                            Waits    Time (s)   (ms)   Time----------------------------------------- ------------ ----------- ------ ------db file sequential read                          5,814         461     79   38.0log file sync                                    4,246         421     99   34.7log file parallel write                          3,222         109     34    9.0CPU time                                                       108           8.9db file parallel write                           1,718          60     35    4.9

So the two tests were running the same application, on the same hardware and for the same duration. Yet the values in the Time (s) column are much bigger when more sessions are running. The total for just the top 5 timed events is 340 seconds compared to 1159 seconds. (The average wait times have also increased in some cases, because the system is busier, and different events become more significant as concurrent sessions contend with each other. There are various things we could try to deduce from this section, particularly as this was a controlled test, but we’d need more supporting information. Let’s just stick to the apparently increased time.)

The problem here, if you see it as a problem, is that aggregated data is being reported. Statspack is not the only example of this, though. Tracing Parallel Execution tasks and then aggregating the trace files using trcsess into one consolidated trace file will also result in variable timings, depending on the Degree of Parallelism. I talked about this in slides 33 to 37 of an earlier presentation. In fact, I can make the Time (s) values as big as I want to, just by running the SOE benchmark with more and more sessions.

That’s why the % Total Call Time is a far more useful column. It tells you the system-wide percentage contribution of each event to any performance problem over the given period so that you can focus on the most significant contributors. It should go without saying that looking at session-level statistics is more specific and therefore more reliable but that’s not always possible unless you’re using ASH, Kyle Hailey’s Simulated ASH, some other session history recorder or can recreate the problem session on demand, but that’s another, bigger argument that I won’t go into here.

So if wall clock time for those tests was about 300 seconds, then what is that bigger value being recorded in the Time (s) column? What I’m really talking about is DB Time, but I’ll leave that for the next post.