The sort of latency heat map you don’t want to see!

 

We’ve had a couple of short lived, but very inconvenient I/O latency issues recently. I’ve been using the awesome Latency Heat Map Visualization by Luca Canali as one of the tools to investigate this.

I’m guessing this isn’t the type of I/O latency heat map most people would want to see from a production system. 🙂

OraLatencyMap

 

This is the same system that has been reporting Warning “aiowait timed out x times” in alert.log [ID 222989.1], which only appears if an asynchronous I/O takes longer than 10 minutes…

The pictures look much nicer when things are going wrong! 🙂

Cheers

Tim…

Author: Tim...

DBA, Developer, Author, Trainer.

3 thoughts on “The sort of latency heat map you don’t want to see!”

  1. I hope you found the root cause otherwise that has all the hallmarks of something that is going to have all the vendors pointing at one another in some really tortuous call conferences.
    last time I encountered anything that bad we ended up finding out exactly how many companies are involved in getting data from oracle to the spinning rust at the other end.

  2. @chris: It’s not resolved yet, but it is intermittent, so it is not killing us all the time. 🙂

    We have evidence to suggest it is not the DB. The system administrators have evidence to suggest it is not the OS. It’s now moved across the the people who control the storage network and the storage itself, so it’s starting to zero in.

    As with any intermittent issue, it’s getting the right people on the case during the incident. We failed to gather enough metrics during the first incident. The DB crashed without pushing anything to disk, so we lost all ASH and AWR data for the time period. We got it second time round though. 🙂

    It’s all a bit painful, but you can only progress it one stage at a time… 🙂

    Cheers

    Tim…

Comments are closed.