First 3 days as a Glass Explorer (Prologue)

I have decided to write up some posts for my first couple days’ experience with Glass.  I am not planning to go deep into the technical details regarding Glass in these posts, but may do so if there are enough interests.

Prologue

When Google first announced Google Glass in Google IO 2012, I signed up immediately to become a Glass Explorer.  Without knowing even a single bit about the Glass specification, 2,000 people still waited in line and signed up for it.  Google gave all Glass Explorers a glass with a number on it, claiming that each Google Glass would be carved with each explorer’s unique number.  Mine was 1109.

I believed such technology can lead us quite far in the future.  The original release date for beta testing was set to be the end of year 2012.  That did not happen, and there were almost no status updates from Google.  To me, that was quite a disappointment.

 

Screen Shot 2013-05-20 at 9.55.56 PM

Shared in Google+ after signing up as a Glass Explorer.

Early this year, to draw more diverse beta-testers to Glass, Google started the #ifIhadGlass competition in Twitter.  The reaction was enormous, and 8,000 more people would be able to get hold of the Glass through the competition.  Still, no status updates.

For Google, things moved along fastest for Google IO.  As expected, Google finally sent out an email update at the end of April this year.   My long wait has finally ended, and I received my Glass during the week of Google IO.  My Glass do not have my unique Glass number on it, but nonetheless, I am happy.

 

Screen Shot 2013-05-20 at 10.13.19 PMPossibly Related Posts:

Laptop and Desktop SSD Update…

I recently wrote about installing SSDs in my Laptop and Desktop. I thought I would write a quick follow up post to mention how things are going.

I’m really happy with the changes to the performance of the desktop. As mentioned previously, it is now much quieter and really fast. A lot of my VMs run from the 1TB internal data drive, but the things I use most frequently are now sitting on the SSD. I’m starting to forget what life was like before SSD, except when I go to work and use the slowest PC that was ever built. :)

The laptop upgrade was a really good move. Just before my first BGOUG presentation the projector seemed to freak out my MacBook and I was forced to reboot. With the old hard drive I would have been filling while waiting for the thing to start up. As it was, it restarted in a similar time it used to take to come out of hibernation and I was moving. :)

Having done the disk swap in the laptop so close to a conference I was a little bit nervous, so in addition to the laptop I had my old 500G external drive, my new 1TB external drive and the oringial internal hard drive in my bag. Unpacking all that, along with my Nexus 7, Nexus 4 and Kindle was very time consuming and a little embarrassing. :)

If you were at all in doubt about making the move to SSD, I can definitely recommend it.

Cheers

Tim…

PS. I reserve the right to start moaning about it when it wears out after a few weeks. :)


Laptop and Desktop SSD Update… was first posted on May 22, 2013 at 8:26 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Who Knew That I Knew So Much?

gather_plan_statistics – 2

Some time ago – actually a few years ago – I wrote a note about the hint /*+ gather_plan_statistics */ making some informal comments about the implementation and relevant hidden parameters. I’ve recently discovered a couple of notes from Alexander Anokhin describing the feature in far more detail and describing some of the misleading side effects of the implementaiton. There are two parts (so far): part 1 and part 2.

 


Oracle Exadata Database Machine: Proving 160 Xeon E7 Cores Are As “Slow” As 128 Xeon E5 Cores?

Reading Data Sheets
If you are in a position of influence affecting technology adoption in your enterprise you likely spend a lot of time reading data sheets from vendors.  This is just a quick blog entry about something I simply haven’t taken the time to cover even though the topic at hand has always be a “problem.” Well, at least since the release of the Oracle Exadata Database Machine X2-8.

In the following references and screenshots you’ll see that Oracle cites 1.5 million flash read IOPS as an expected limit for both the full-rack Oracle Exadata Database Machine X3-2 and the Oracle Exadata Database Machine X3-8. All machines have limits and Exadata is no exception. Notice how I draw attention to the footnote that accompanies the flash read IOPS claim. Footnote number 3 says that both of these Exadata models are limited in flash read IOPS by the database host CPU. Let me repeat that for certain types of individuals that spend an inordinate amount of time scrutinizing my words for reasons other than education: The Oracle Exadata Database Machine data sheets explicitly state flash read IOPS are limited by host CPU.

Oracle’s numbers in this case are SQL-driven from Oracle instances. I have no doubt these systems are both capable of achieving 1.5 million read IOPS from flash because, truth be told, that isn’t really all that many IOPS–especially when the IOPS throughput numbers are not accompanied by service times. In the 1990s it was all about “how much” but in modern times it’s about “how fast.” Bandwidth is an old, tired topic. Modern platforms are all about latency. Intel QPI put the problem of bandwidth to rest.

So, again, I don’t doubt the 1.5 million flash read IOPS citation. Exadata has a lot of flash cards and a lot of host processors to drive concurrent I/O. Indeed, with the concurrent processing capabilities of both of these Exadata models, Oracle would be able to achieve 1.5 million IOPS even if the service times were more in line with what one would expect with mechanical storage. Again, we never see service time citations so in actuality the 1.5 million number is just a representation of how much in-flight I/O the platform can handle.

Here is the new truth: IOPS is a storage bandwidth metric.

Host CPU Limited! How Many CPUs?
Here’s the stinger: Oracle blames host CPU for the 1.5 million flash read IOPS number. The problem with that is the X3-2 has 128 Xeon E5-2690 processor cores and the X3-8 has 160 Xeon E7-8870 processor cores. So what is Oracle’s real message here? Is it that the cores in the X3-8 are 20% slower than those in the X3-2 model? I don’t know. I can’t put words in Oracle’s mouth. However, if the data sheet is telling the truth then one of two things is true, either a) the E5-2690 processors are indeed 20% faster on a per-core basis than the E7-8870 or b) there is a processing asymmetry problem.

Not All CPU Bottlenecks Are Created Equal
Oracle would likely not be willing to dive into technical detail to the same level I do. Life is a series of choices–including who you chose to buy storage and platforms from. However, Oracle’s literature is clear about the number of active 40Gb QDR Infiniband ports there are in each configuration and this is where the asymmetry comes in. There are 8 active ports in both of these models. That means there are 8 streams of interrupt handling in both cases–regardless of how many cores there are in total.

As is the case with any networked storage, I recommend you monitor mpstat -P ALL output on database hosts to see whether there are cores nailed to the wall with interrupt processing at levels below total CPU-saturation.  Never settle for high-level aggregate CPU utilization monitoring. Instead, drill down to the per-core level to watch out for asymmetry. Doing so is just good platform scientist work.

Between now and the time you should find yourself in a proof of concept test situation with Exadata, don’t hesitate to ask Oracle why–by their own words–both 128 cores and 160 cores are equally saturated when delivering maximum read IOPS in the database grid. After all, they charge the same per core (list price) to license Oracle Database on either of those processors.

Nice and Concise?
By the way, is there anyone who actually believes that both of these platforms top out at precisely 1.5 million flash read IOPS?

Oracle Exadata Database Machine X3-2 Datasheet

X3-2-datasheet-IOPS

Oracle Exadata Database Machine X3-8 Datasheet

X3-8-datasheet-IOPS

DISCLAIMER: This post tackles citations straight from Oracle published data sheets and published literature.


Filed under: oracle

ODA re-imaging could take anything between 20 and 120 mins

20 mins vs 2 hours

Recently I have noticed that re-imaging process on the second Oracle Database Appliance node took significantly shorter time comparing with the first node. The difference was so significant that I started to suspect that there were something wrong with either particular set of hardware or some of the re-imaging process steps have failed on the second node. On the first node the process has completed  in 120 minutes, but  on the second it took just 20 around minutes.

I spent quite a bit of time to understand that exactly has been happening. But before I tell you, can I ask you what theoretical options would you come with given the behavior I just described? Please share those with me in the comment section below, please :)

Any mystery can be solved

Question is if we are ready to pay for it. Sometimes it takes quite a bit of effort to get to the truth and very often we don’t have time or interest or budget to find it. In this particular case I was so curious that I have spent good part of my my weekend looking for a clue. On the way I had  to learn a bit of “Anaconda (installer)“, SquashFS file system, how to rebuild ISO image and the way ODA re-imaging process works. The purpose of this paragraph is to encourage you to be curious and don’t leave mysteries unresolved. Invest  some time and you will learn a lot on the way :)

NOTE: I will try to share the way I have troubleshot this problem in my future blog posts.

Bug in the “post-install” script

It appears that the problem is in the way the ISO:/Extras/setupodaovm.sh post install script checks if software RAID have completed re-synchronization of 4 internal HDD partitions (md devices) in between 2 physical disks. There are the following check at the very end of the script:
mdadm --wait /dev/md1
mdadm --wait /dev/md2
mdadm --wait /dev/md3

Each of the lines designed to check if the software RAID completed synchronizing an md device (partition). The following is part of man page for mdadm utility

       -W, --wait
              For  each  md  device  given, wait for any resync, recovery, or reshape activity to finish before returning.  mdadm will return with success if it actually waited for
              every device listed, otherwise it will return failure.

During the re-imaging process all 4 volumes got to be rebuild and need to be synchronized by the software RAID. It worth mentioning that software RAID on ODA configured to re-synchronise  one device at the time. Other devices just seating and waiting they turn in the status DELAYED.  The problem is that if a device is in the state resync=DELAYED the “mdadm –wait” check will not stop and wait for it. Therefore just one of the mdadm checks will wait until re-synchronization process finishes others successfully pass even if a device isn’t synchronized yet (resync=DELAYED). Now let’s have a look on the devices’ sizes and associated synchronization times:

Name Size  Function Sych-time
md0 60M /boot few seconds
md1 17G / 10 mins
md2 217G /OVS 90 mins
md3 4G swap ~2min

Just to make life a bit more interesting the software RAID picks up a device to be re-synchronized next randomly. That means it is just matter of luck what device will get processed next. If it is md1 device (17GB) then the whole re-imaging process takes 20 minutes. However if the software RAID synchronises md2 device (217GB) during the execution of the mdadm check then the re-imaging process takes ~120 minutes.

A way to fix the problem

I am not a great expert in the Linux System Administration area (I am an Oracle DBA after all) and would rather let Oracle folks make the final call, but it seems to me that in order to make sure that all 4 devices got re-synchronized before the re-imaging process finishes the check should look like the following.

mdadm --wait /dev/md0 /dev/md1 /dev/md2 /dev/md3

Conclusion

To conclude until the issue is fixed know that

  1. you may face different ODA nodes’ re-imaging times
  2. to be on the safe side check if md devices’ re-synchronization process  is finished by running “cat /proc/mdstat” command before running any business critical processes on your ODA.

Yury

PS “Stay Hungry Stay Foolish” - Steve Jobs

High Performance Tuning Tools

Performance Tuning

Obligatory Melodrama

No matter how much time goes by I still remember it. The day my database was crippled beyond reckoning. That moment when I saw my hopes for a bright and shining future with my database spill through my fingers like so many cracker crumbs falling on a clean and well pressed pair of slacks.

It was the day I was told that we had to ditch AWR/ASH for a downgrade to Standard Edition.

There Will Won’t Be Blood

You know, it’s funny. I got started on Oracle 7.3 (I know, I’m still a whippersnapper to many of you), back when men were real men, women were real women, and sqlplus / as sysdba was connect internal. We didn’t have no Statspack, we didn’t NEED no Statspack. We ran UTLBSTAT and we liked it. And when we didn’t like it, we ran UTLESTAT.

No, really, it sucked. But at the time, it wasn’t that bad. We seemed to find what we needed (most of the time) in the tools we had, and if we couldn’t find what we needed to do we’d drop back ten and punt (meaning we’d rewrite queries until our fingers bled).

Performance FeaturesThen Oracle 8 came and things still sucked. But then something magical happened. Oracle added an i. In case you don’t know, that ‘i’ stands for cloud internet. In version 8.1.6 we got Statspack. And suddenly understanding our databases got a little cooler. We could get the big picture, a real top level view of what was going on in the instance as a whole. It gave us hourly snaps that we could use to see how our database progressed throughout the day/week/month/year/ORA-01653. The wait interface was lacking, the details sketchy or missing, but it still worked. It made a lot of things easier to understand.

With Oracle 9i it got a little better. But then Oracle worked their magic again and added a g. In case you don’t know, ‘g’ stands for cloud grid. With Oracle 10g we got AWR and ASH. Whether we paid for it or not. And we got hooked. It wasn’t just a snapshot tool, it’s built into the architecture. It grows, it evolves, it slices, it dices! On sale today, and mind that you don’t disable it because that feature’s protected by license (I kid, I kid, they fixed that). And ASH, don’t even get me started. Rolling session-based performance snapshots? Near realtime performance tracking? Be still my beating heart.

So naturally when you’re losing it, it feels like you’ll never make it in this cruel, cruel world. It’s hard to have something so convenient taken away. But there’s still plenty of options out there for you.

Take in the Views

First of all, Oracle’s own DBA_, V$, and X$ views are becoming more and more detailed and easier to query by the version. Show of hands, how many of you still query v$session_event instead of v$session just to get an event name? There’s a ton of good information to be gleaned from Oracle’s stock views and packages. From DBMS_XPLAN (which gets better every version) to time model statistics and beyond, the metrics are getting better.

Just make sure they’re the metrics you’re looking for.

Oracle’s views are just a small piece of the performance tuning puzzle, however. Don’t forget that there is a world of events, tracing, and OS level analysis you can do. Just because we have these neat high level tools doesn’t mean you should forget the low level treasure troves. Sometimes the holistic tools don’t fully expose the details.

Snap It Up

FilmSeriously guys, Snapper 4 is the coolest thing since Snapper 3. Using Snapper, you can get ASH-like session runtime details with manually controlled snapshots of V$ views by running a single script. Tanel Poder can work a SQL prompt like no other, and Snapper is his magnum opus.

You can download Snapper by visiting his page (as he says, download it, don’t paste it) and run it immediately. With or without ASH, it’s well worth running for in depth details on your instance runtime as a whole or targeted to a single SID/user/whatever.

Sashay Into the Room

Another option for more long-term ASHishness is S-ASH by Kyle Hailey and the ASH Masters crew. This pack of scripts actually lets you create a repository database for your statistics and gather runtime data just like the real thing. You’ll be able to get all your session details along with time model statistics. It works without a Diagnostics Pack license and can even work against Oracle 9i.

And if you’re really missing those stacked area charts where you can visualize that poor sweet CPU horsepower being crushed mercilessly by the evil bad User I/O and Concurrency, you can even use ASHMON (another ASH Masters product) to get your graphs back.

Ignite Your Performance – Free!

Confio Ignite is a highly popular monitoring software with response time analysis to help locate tricksy bottlenecks. But if your company pursestrings are a little tight (after all, cheapskates won’t even buy you Diagnostics Pack) Confio is gracious enough to offer Ignite Free.

While it doesn’t have the full drill-down capabilities of the full version, it IS free and therefore a bargain. It also monitors SQL Server, DB2, Sybase, and has a version for databases on VMWare.

The Web

There is a plethora of advice out there for nearly any version of Oracle you may be using and a decent portion of the problems you may encounter. And if the problems you face can’t be found anywhere on that lovable series of tubes we call the Internet, there are forums and discussion groups and social networks full of people who love to help. And that’s just aces. (ba dum bum bum)

But…

A learned man came to me once. He said, “I know the way, — come.” And I was overjoyed at this. Together we hastened. Soon, too soon, were we where my eyes were useless, and I knew not the ways of my feet. I clung to the hand of my friend; but at last he cried, “I am lost.” - Stephen Crane

Remember that actions have consequences (both good and bad). Be sure you test before you tune, and never ever blindly make changes to a production environment. Why, who knows where the advice has been. It might be dirty.

Good luck, and happy tuning!

The post High Performance Tuning Tools appeared first on Steve Karam :: The Oracle Alchemist.

The sort of latency heat map you don’t want to see!

We’ve had a couple of short lived, but very inconvenient I/O latency issues recently. I’ve been using the awesome Latency Heat Map Visualization by Luca Canali as one of the tools to investigate this.

I’m guessing this isn’t the type of I/O latency heat map most people would want to see from a production system. :)

OraLatencyMap

 

This is the same system that has been reporting Warning “aiowait timed out x times” in alert.log [ID 222989.1], which only appears if an asynchronous I/O takes longer than 10 minutes…

The pictures look much nicer when things are going wrong! :)

Cheers

Tim…


The sort of latency heat map you don’t want to see! was first posted on May 20, 2013 at 6:13 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Beginnings

“A beginning is the time for taking the most delicate care that the balances are correct.”

It is spring. Time for planting new seeds. I started on a new job last week, and it seems that few of my friends and former colleagues are on their way to new adventures as well. I’m especially excited because I’m starting not just a new job – I will be working on a new product, far younger than Oracle and even MySQL. I am also making first tiny steps in the open-source community, something I’ve been looking to do for a while.

I’m itching to share lessons I’ve learned in my previous job, three challenging and rewarding years as a consultant. The time will arrive for those, but now is the time to share what I know about starting new jobs. Lessons that I need to recall, and that my friends who are also in the process of starting a new job may want to hear.

Say hello
I’m usually a very friendly person and after years of attending conferences I’m very comfortable talking to people I’ve never met before. But still, Cloudera has around 200 people in the bay area offices, which means that I had to say “Hello, I’m Gwen Shapira the new Solutions Architect, who are you?” around 200 times. This is not the most comfortable feeling in the world. Its important to go through the majority of the introductions in the first week or two, later on it becomes a bit more awkward. So in the first week it will certainly seem like you are doing nothing except meeting people, chatting a bit and franctically memorizing names and faces. This is perfectly OK.

Get comfortable being unproductive
The first week in a new job feels remarkably unproductive. This is normal. I’m getting to know people, processes, culture, about 20 new products and 40 new APIs. I have incredibly high expectations of myself, and naturaly I’m not as fast installing Hadoop cluster as I am installing RAC cluster. It takes me far longer to write Python code than it does to write SQL. My expectations create a lot of pressure, I internally yell at myself for taking an hour or so to load data into Hive when it “should” have taken 5 minutes. But of course, I don’t know how long it “should” take, I did it very few times before. I’m learning and while learning has its own pace, it is an investment and therefore productive.

Have lunch, share drinks
The best way to learn about culture is from people, and the best way to learn about products is from the developers who wrote them and are passionate about how they are used. Conversations at lunch time are better than tackling people in the corridor or interrupting them at their desk. Inviting people for drinks are also a great way to learn about a product. Going to someones cube and asking for an in-depth explanation of Hive architecture can be seen as entitled and bothersome. Sending email to the internal Hive mailing list and saying “I’ll buy beer to anyone who can explain Hive architecture to me” will result in a fun evening.

If its not overwhelming, you may be in the wrong job
I’m overwhelmed right now. So many new things to learn. First there are the Hadoop ecosystem products, I know some but far from all of them, and I feel that I need to learn everything in days. Then there is programming. I can code, but I’m not and never have been a proficient programmer. My colleagues are sending out patches left and right. It also seems like everyone around me is a machine learning expert. When did they learn all this? I feel like I will never catch up.

And that is exactly how I like it.

Make as many mistakes as possible
You can learn faster by doing, and you can do faster if you are not afraid of failing and making mistakes. Mistakes are more understandable and forgivable when you are new. I suggest using this window of opportunity and accelerate your learning by trying to do as much as possible. When you make a mistake smile and say “Sorry about that. I’m still new. Now I know what I should and shouldn’t do”

Take notes
When you are new a lot of things will look stupid. Sometimes just because they are very different from the way you are used to things in a previous job. Don’t give in to the temptation to criticise everything, because you will look like a whiner. No one likes whiner. But take note of them, because you will get used to them soon and never see things with “beginner mind” again. In few month take a look at your list, if things still look stupid, it will be time to take on a project or two to fix them.

Contribute
I may be new at this specific job, but I still have a lot to contribute. I try hard to look for opportunities and I keep finding out that I’m more useful than I thought. I participate in discussions in internal mailing lists, I make suggestions, I help colleagues solve problems. I participate in interviews and file tickets when our products don’t work as expected. I don’t wait to be handed work or to be sent to a customer, I look for places where I can be of use.

I don’t change jobs often. So its quite possible that I don’t know everything there is to know about starting a new job. If you have tips and suggestions to share with me and my readers, please comment!


Additional plugins are required issue in Firefox with Forms 11g

At the time of writing this post, Oracle Forms 11g installations are configured by default to work with a specific JPI (Java Plug-In) version: 1.6.0_12. You can verify this by checking your formsweb.cfg file: grep jpi-version formsweb.cfg #jpi_mimetype=application/x-java-applet;jpi-version=1.6.0_12 Unfortunately, it seems Firefox is very particular when it comes to JPI versions, and it will only run with the

Read More...