There have been a number of concurrent threads on a large number of different forums scattered around the net recently, all concerning the merits of taking an evidence based approach versus that of taking a holistic experienced practitioner’s view to Oracle performance problems.
I’m not going to comment too much in this entry on the subject, but I am considering a slightly longer article on what to do when experts disagree. Mostly I’m going to remark upon Mike Ault’s latest blog entry, which Mark Rittman has pointed out.
Before I do that, I suspect its worth laying out a little of the background.
Back in February Don Burleson posted a ‘challenge’ asking for a solution (a script or scripts) that would predict when tables would benefit from reorganisation. Tom Kyte was asked about this on http://asktom.oracle.com . His reply was essentially, the factors listed aren’t deterministic, in fact of the 4 listed, the first was true for some sorts of activity and false for others, the second was largely irrelevant, the third could be better fixed by a design change than the sticking plaster of periodic re-orgs and the fourth displayed an incomplete understanding of the technical workings of Oracle (freelists vs freelist groups). Tom being Tom demonstrated all this with code we can run ourselves. All things being considered not a contest that it was worth winning. It is of course possibly impertinent to point out that the reward for this challenge was $250 worth of books, and the right for Don to publish the work in an upcoming, for sale, book.
Faced with this reply, Don then posted a discussion topic apparently discussing whether sqlplus scripts could be used to prove anything. Admittedly this soon became whether sqlplus scripts could prove anything (looks like the winner of the challenge will be a full-blown multi-user benchmark). Now it may or may not be the case that this was really the aim of the discussion. Don wrote the following.
I’m NOT going to bash another Oracle author, it’s unprofessional. . . . I was interviewing a DBA yesterday (30 years DBA experience, Certified Oracle Master, ex-Oracle University instrucor), and I asked “What do you think about the work of Tom Kyte”?He replied that he suspected that Kyte had very little real-world experience diagnosing and tuning large production databases. (I’ve wondered about that myself, since I cannot find his experience and qualifications anywhere online.)
Quite frankly the claim to be interviewing someone looks like a lie (Note the first relational database was not available until 1977.), it looks like good old smear tactics.
So my summary of the background is;
The owner of Burleson Consulting published a challenge asking for scripts to prove something.
Tom Kyte, when asked about it suggested that the challenge was rather seriously flawed and provided evidence to show why.
The owner of Burleson Consulting then instituted a discussion, on his own boards but also taken up elsewhere, aimed at discrediting evidence based argument in general and Tom Kyte in particular.
Don’t take my word for it though, read the discussions and form your own conclusions.
Its against this background that I come to Mike’s blog. Mike is of course also a Burleson Consulting employee. Mike is also taking issue with the idea that scripts can be used to show things that are useful from a tuning perspective. Firstly he describes a database health check. They come in check the database wide settings and stats, security and backup and recovery procedures. This can be very useful, especially to shops with limited dba resource.
Next however he goes on to describe a tuning assignment. Quotation in italics – comments in normal.
We never go onsite and start issuing recommendations without analysis. The last time I saw this technique used was from Oracle support three days ago. Without looking at detailed statistics or analysis they suggested increasing the shared pool (even though all of it wasn’t being used), increasing the database cache size (even though 30% of the buffers were free in V$BH), turn off MTS (we had already tried this and it made no difference) and several other Oracle “Silver Bullets”, none of which made the slightest difference in the problem.
So the description describes a problem, support had come in, tried various things with no supporting evidence – the classic silver bullet approach. It didn’t work.
It turned out the problem was that 10g has a default value of TRUE for the parameter _B_TREE_BITMAP_PLANS which, even though the application had been running for months, suddenly started making a difference. Note there were no bitmaps in the database other than those that Oracle had in their schemas. They offered no proofs, no explanations, just a sigh of relief that this Oracle Silver Bullet fixed the problem. This is a classic example of why “simple”, “repeatable” proofs can be hazardous. By all of the indications there was nothing wrong at the database level. No waits or events showed the problem, even plans for the SQL didn’t change. Any SQL activity resulted in CPU spiking of up to 70-80%. However, the setting of this undocumented parameter had a profound effect on the optimizer and overall database performance. If an Oracle analyst hadn’t “seen this before” and remembered this “Silver Bullet” the client would still be having problems. The troubled queries dropped form taking several minutes to 6 seconds
Well it looks like the silver-bullet troubleshooting process was continued try switching off _b_tree_bitmap_plans see what happens. It also rather looks as if Mike hasn’t come across _b_tree_bitmap_plans which allows for bitmap access paths even without bitmap indexes (and the parameter got this default in 9i). The comment about no bitmaps makes no sense otherwise.
There are two interesting observation here, and that is that the elapsed response time for specific queries dropped from several minutes to 6 seconds, and that execution plans didn’t change.
I find this interesting for two reasons, first there is a correct change of focus from database wide to problem queries (and measurement and evidence), and second that there is a clear implication – parsing time was taking several minutes per query. (plans and waits were the same – must be the variable cpu which is in the parse phase). The second feels like a bug, (or perhaps a misdiagnosing of the actual plans in use).
.When going from 7 to 8 to 8i to 9i to 10g Oracle has been notorious for resetting their undocumented parameters from true to false, false to true, adding new undocumented and documented parameters and of course varying the various number based parameters to get the right “mix”. You have to understand this in order to trouble shoot Oracle performance problems. Even within the same release some of these change based on platform.This is the point I am trying to make about single user, small database proofs, they are great for documenting Oracle behavior on single user, small databases. They would have been completely useless in this case.
But having identified single statements that were taking an excessive length of time to parse, it would seem that single-user sqlplus scripts would be great
alter session set events ’10053 trace name con
text forever, level 1′; — to get the parsing activity
alter session set events ’10046 trace name context forever, level 812′; — to get timing
turn off the events.
a pound to a penny the impact of excessive consideration (and then rejection) of b-tree-bitmap plans would have fallen out. The alternative scenario is this, suppose you’d got a different analyst who hadn’t seen this before. What would have happened when he ran out of all the things he had seen before. What if you hit a new problem. What then?