Archive for the ‘performance’ tag
Proof by extension..
Both an ORACLE-L discussion here and a Stack Exchange post raise essentially the same question which is, paraphrased:
What is the performance overhead of datafile autoextend events.
The conventional wisdom, with which I agree, is that the overhead is insignificant. However the poster in the stack exchange dialog asked how one would determine this empirically. A little bit of googling didn’t find anything so I thought I’d throw out some examples that can (with a little tweaking for file locations and sizes) be run on your system if you are interested. The script I used is reproduced at the end of the post, but is described here for ease of understanding. The tests described were all conducted on a laptop using a single laptop HDD. As you’ll see from the results the actual hardware you have will impact this test significantly.
The script creates 2 tablespaces – one fixed at 1gb in size and one that starts small and is allowed to grow. The size and growth rate of the small tablespace are determined by the size of each extent in a test table that is created in each tablespace. I ran sets of tests to determine the elapsed time taken to add a specific volume of data using different extent sizes for the table,(and hence different numbers of auto-extend operations), the setup is such that the growing tablespace should grow once for each time an extent is allocated to the table in the tablespace. The static tablespace is sized so that it does not need to grow at all by comparison.
I made tests at extent sizes (and autogrowth increments) of 1,10,25,50 and 100mb and data volumes from 100mb to 1000mb. Elapsed times are taken from sqlplus set timing on command. My results are shown in the table below, All elapsed times are in seconds.
| Extent Size (MB) | Data Allocated MB | Elapsed (growth) | Elapsed (Static) | Extend Ops |
| 1 | 100 |
25.78 |
0.680 |
100 |
| 10 | 100 |
5.89 |
0.110 |
10 |
| 25 | 100 |
6.08 |
0.070 |
4 |
| 50 | 100 |
3.95 |
0.050 |
2 |
| 100 | 100 |
3.23 |
0.004 |
1 |
| 1 | 500 |
138.17 |
3.340 |
500 |
| 10 | 500 |
24.33 |
0.450 |
50 |
| 25 | 500 |
20.52 |
0.170 |
20 |
| 50 | 500 |
15.73 |
0.090 |
10 |
| 100 | 500 |
15.55 |
0.060 |
5 |
| 1 | 1000 |
276.68 |
7.510 |
1000 |
| 10 | 1000 |
51.72 |
0.720 |
100 |
| 25 | 1000 |
37.99 |
0.310 |
40 |
| 50 | 1000 |
33.73 |
0.170 |
20 |
| 100 | 1000 |
32.63 |
0.070 |
10 |
I then calculated the elapsed time per extension and the elapsed time per extent allocation (by using the static tablespace timings). The second of these metrics varied a bit when very few extents were being allocated but was remarkably consistent at the 7-8ms mark. This suggests that the impact of allocating a new extent to an object is about equal to the impact of a single random disk I/O (or measurable but negligible). When it came to object extension then the average elapsed time per file system extension varied pretty much linearly with how much space was being allocated per file extension. On my system the average numbers (and the std deviation was very low though my sample size is small) came out as shown below.
| Extension Size (MB) |
Duration per operation (s) |
| 1 |
0.27 |
| 10 |
0.53 |
| 25 |
1.17 |
| 50 |
1.74 |
| 100 |
3.20 |
It appears to me that what happens at file extension is pretty likely to be the same as what happens when a new datafile is added – namely space is allocated in the file system and the blocks zeroed out. The ‘overhead’ therefore will vary according to the capabilities of your I/O subsystem. The data above for me graphs as follows:
Conclusions
There are 2 competing considerations when determining the optimal autoextend size for a datafile, and the balance between the 2 probably varies by I/O subsystem. In the greater cosmic scale of things neither is probably as important as the efficiency of your data import process or the SQL used to populate your tables. These 2 considerations are:
- What are the constraints on elapsed time that any individual transaction cannot exceed.
- Overall how efficient do we want to be.
In my case moving from a 25mb auto-extend size to a 50mb auto-extend size means that overall my dataload is 27% faster, but it does imply an increase in Worst case response time from 1.1 to 1.7s which may not be acceptable in an OLTP environment.
Script
The script I used is reproduced below. It will require editing if you use it for file locations.
set timing off set verify off accept extent_size prompt "Enter extent size in MB [default 1)" default 1 accept growth prompt "Enter desired growth in extents [default 100]" default 100 col min_size new_value min_file_size select (&extent_size. + 1 ) min_size from dual; drop tablespace ts_&extent_size. including contents and datafiles; drop tablespace static_ts including contents and datafiles; create tablespace ts_&extent_size. datafile 'c:\app\oracle\oradata\niall\ts_&extent_size..dbf' size &min_file_size.m autoextend on next &extent_size.m extent management local uniform size &extent_size.m; create tablespace static_ts datafile 'c:\app\oracle\oradata\niall\static_ts.dbf' size 1g autoextend off ; create table growing_tab(n1 number, vc1 varchar2(100)) tablespace ts_&extent_size.; create table static_tab(n1 number, vc1 varchar2(100)) tablespace static_ts; set timing on begin for i in 1..&growth. loop execute immediate 'alter table growing_tab allocate extent'; end loop; end; / begin for i in 1..&growth. loop execute immediate 'alter table static_tab allocate extent'; end loop; end; / drop tablespace ts_&extent_size. including contents and datafiles; drop tablespace static_ts including contents and datafiles;
Possibly Related Posts:
Advice from the Internet
A somewhat recursive post here. There are a number of forums around the internet which Oracle professionals increasingly use to research various issues, discover new features and diagnose problems. One such is the relatively new Database Administrators stack exchange site. For various reasons I came across this thread and in particular a piece of advice on logging long running queries using an autonomous transaction.
Personally, I suggest using autonomous transactions to enable this feature — not on the transaction itself, but as a logging mechanism letting you know what is going on. For example, you could have PROCEDURE LONG_ACTION call PROCEDURE WRITE_LOG_ENTRY (defined as an autonomous transaction) that would write a VARCHAR2 to another table. Autonomous transactions do NOT interfere with your current transaction and so you can see what’s going on via your logging entries regardless of a COMMIT or ROLLBACK in your current transaction. That said, you can’d do that with one massive DML statement; you’d have to use a loop.
Now it seems to me at least that the caveats I have highlighted above are somewhat underplayed. There are 2 issues here:
- The original transaction is affected – in fact it is re-written entirely as a series of single row updates in a loop (a.k.a slow by slow) rather than a single update loop.
- The result of this is that you will get logs, but the runtime will grow potentially orders of magnitude. This is unlikely to be what you want.
So there you are: 2 pieces of advice on the Internet. How to choose between them? I suggest testing them out. The script below tests the scenario above. I have a ‘hairy calculation’ namely a function to tell whether a given positive integer is prime, I want to run it through the first ten thousand positive integers and update a table of primes accordingly. As sieving through primes is computationally intensive I expect this to take a measurable amount of time. Lets see shall we.
/*
Script to demonstrate problems with
http://dba.stackexchange.com/questions/2994/oracle-any-way-to-view-uncommited-changes-to-a-particular-table/3051#3051
*/
-- create a table of all numbers from 1 to 1m
drop table primes purge;
create table primes (real,is_prime)
as select rownum,0 from dual
connect by level <= 10000;
-- create a function to test 'primeness'
create or replace function is_prime(p in number) return number
is
l_limit number;
i number := 3;
retval number := 1; -- returns 0 if false, 1 if true
begin
l_limit := sqrt(p);
if p = 1 then
retval:=1;
end if;
if p = 2 then
retval :=1;
end if;
if mod(p,2) = 0 then
retval := 0;
end if;
while (i <= l_limit) loop
if mod(p,i) = 0 then
retval := 0;
end if;
i := i + 2; -- (no need to test even numbers)
end loop;
return retval;
end;
/
-- Create proposed function
drop table log_entries purge;
CREATE TABLE LOG_ENTRIES(
activity_date date,
log_entry varchar2(2000)
);
create or replace PROCEDURE WRITE_LOG_ENTRY
( str VARCHAR2 )
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
INSERT INTO LOG_ENTRIES VALUES ( SYSDATE, str );
COMMIT;
END;
/
-- now do the tests
create or replace PROCEDURE LONG_ACTION_SUGGESTED IS
c NUMBER;
BEGIN
FOR r IN ( SELECT real FROM PRIMES )
LOOP
c := c + 1;
UPDATE PRIMES z
SET IS_PRIME = IS_PRIME(real)
WHERE z.real = r.real;
IF MOD(c,500) = 0 THEN
WRITE_LOG_ENTRY ( c || ' rows processed.' );
END IF;
END LOOP;
COMMIT;
END;
/
create or replace PROCEDURE LONG_ACTION IS
BEGIN
UPDATE PRIMES
SET IS_PRIME = IS_PRIME(real);
COMMIT;
END;
/
set timing on
prompt ' run the update in sql without logging'
update primes set is_prime = is_prime(real);
commit;
prompt ' run the suggested logging approach'
exec LONG_ACTION_SUGGESTED;
prompt ' run from pl/sql approach'
exec LONG_ACTION;
In this example I time how long my update takes when run from sql, how long when run from pl/sql and how long when modified in the manner suggested by the article. My results are below.
' run the update in sql without logging' 10000 rows updated. Elapsed: 00:00:01.08 Commit complete. Elapsed: 00:00:00.00 ' run the suggested logging approach' PL/SQL procedure successfully completed. Elapsed: 00:00:07.98 ' run from pl/sql approach' PL/SQL procedure successfully completed. Elapsed: 00:00:01.26
In other words what took just a second without my autonomous transaction took 8 seconds with it. The test is easily reconfigurable though – running with 100000 rows in the table produced the following results.
Elapsed: 00:00:00.01 ' run the update in sql without logging' 100000 rows updated. Elapsed: 00:00:21.55 Commit complete. Elapsed: 00:00:00.01 ' run the suggested logging approach' PL/SQL procedure successfully completed. Elapsed: 00:07:26.05 ' run from pl/sql approach' PL/SQL procedure successfully completed. Elapsed: 00:00:20.50
That is we jumped from 20 seconds or so to 7 and half minutes.
So next time you see some advice on the internet, it might well be worth your while testing it out first, especially if the author provides you with the tools to allow you to do this.
Possibly Related Posts:
Distinctly Odd (Update)
Greg Rahn points out that the improved NDV estimation arrives in 11.1 not 11.2 (which my article distinctly odd implies) see http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ for more on this.
Possibly Related Posts:
Distinctly Odd
I recently got involved with a performance investigation for an Oracle 9.2 database. The process of investigation threw up some interesting information for me regarding the accuracy of statistics collection in Oracle. It also highlights how different defaults in different versions of Oracle can lead to remarkably different statistics and hence execution plans. Finally, it might give you pause when trying to determine what value to use for estimate_percent when gathering object statistics.
Problem
The users of the database in question had observed significantly higher than usual CPU usage following some data maintenance. This was causing significant slowdown for the end users. As this is a 9.2 database the performance tools available to us were statspack reports, the statspack schema and of course the dynamic performance views and our monitoring scripts. Had this been a 10.1 or higher database then the AWR data would enable us to drill directly into the high users of CPU resource to pinpoint the problem SQL. In 9.2 the statspack report does not explicitly list SQL by CPU usage. In many cases, including this one however, the section of the report headed SQL Ordered by Gets is a somewhat useful proxy for high CPU usage statements, especially if the load on the database is generated by conventional SQL (select, insert etc) and not by, for example, calls to Java Stored procedures and so on. This is because a logical i/o is a CPU intensive operation. Sure enough the statspack reports for the period after the maintenance listed some statements in this section that had not previously appeared. One in particular was of the form
Select <column_list> From parent, child Where parent.pk = child.parent_pk And parent.key_column in (:b1,:b2,:b3,:b4);
The bind variables in this case occurred because the parameter CURSOR_SHARING has been set to the value of ‘SIMILAR’. In production this query had the following explain plan
------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 476K| 48M| | 108K (1)| 00:21:40 | |* 1 | HASH JOIN | | 476K| 48M| 4328K| 108K (1)| 00:21:40 | |* 2 | TABLE ACCESS FULL| PARENT | 52750| 3708K| | 6258 (1)| 00:01:16 | | 3 | TABLE ACCESS FULL| CHILD | 13M| 443M| | 72218 (1)| 00:14:27 | ------------------------------------------------------------------------------------------------
Whereas in development, which had not had the data maintenance carried out the plan looked like this
--------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 287 | 33292 | 100 (0)| 00:00:02 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 287 | 33292 | 100 (0)| 00:00:02 | | 3 | INLIST ITERATOR | | | | | | | 4 | TABLE ACCESS BY INDEX ROWID| PARENT | 32 | 2592 | 11 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | IDX_PARENT_KEY | 32 | | 6 (0)| 00:00:01 | |* 6 | INDEX RANGE SCAN | IDX_CHILD_FK | 9 | | 2 (0)| 00:00:01 | | 7 | TABLE ACCESS BY INDEX ROWID | CHILD | 9 | 315 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------------
Clearly the statistics for the two environments are considerably different, not in particular that the production database expects 52,000 rows to match the in list whereas the development database only expects to retrieve 32. This was even more suprising given that the nature of the maintenance was that an archival process had taken place in production leaving it with 1,868,541 rows, just 87% the size of the development table. With 2,138,954 rows
Object Statistics
Oracle describes object statistics like this (I’ve omitted the description of system statistics for this article)
Optimizer statistics describe details about the database and the objects in the database. The query optimizer uses these statistics to choose the best execution plan for each SQL statement.
Optimizer statistics include the following:
- Table statistics
- Number of rows
- Number of blocks
- Average row length
- Column statistics
- Number of distinct values (NDV) in column
- Number of nulls in column
- Data distribution (histogram)
- Extended statistics
- Index statistics
- Number of leaf blocks
- Levels
- Clustering factor
This article is only concerned about the table and column statistics, since it is these that drive the estimates of cardinality in our plans above.
Esimate_percent
There’s a lot of information around in the Oracle community about statistics gathering – most of it based on experience and un-evidenced. So for example at the otn forums here we can read that you never need 100% stats gathering and that 20% is a good number. The oracle applications product suggests (Note 156968.1 – MOS login required) that both sample size and collection frequency should be based on the number of rows in the table as shown below.
| NUM_ROWS | Sample Size | Frequency (weeks) |
| NULL | 30 | Once |
| 10^6 | 100 | 3 |
| 10^7 | 30 | 4 |
| 10^8 | 10 | 5 |
| 10^9 | 3 | 6 |
| >10^9 | 1 | 7 |
This is somewhat more sophisticated but somewhat begs the question “How many Oracle Applications tables have more than a billion rows, and why aren’t they partitioned anyway?” What is commonly agreed is that your statistics don’t have to be up to date, but they do have to represent the data accurately. Most commonly the degree to which the statistics are considered representative focuses upon the NUM_ROWS stat. That is people will often remark along the lines of Dave Ensor of Oracle :
I don’t know much about databases, but I do think I know the following: small tables tend to stay small, large tables tend to stay large, unique indexes have a tendency to remain unique, non-unique indexes tend to stay non-unique
In other words if the CBO knows the size of the table (and the upper and lower bounds of values in it) then the stats are “representative”.
Cardinality Estimates
When it comes to estimating the cardinality of a filter with a single bind variable the formula is
Card = ROWCOUNT/NDV
Where NDV is the number of distinct values for the column in the table. That is the optimizer assume the values are evenly distributed. For an inlist this is multiplied up by the number of members in the inlist. In our case the statement that was performing poorly had 4 members in its in list.
I quickly determined that the rowcount figures for the tables in question were in good agreement with the actual number of rows in the table. That is, by common definition the statistics were representative. This meant that the difference in must lie in the CBOs estimate of the number of distinct values in the column. (there being no histograms here)
Sure enough running a 10053 trace showed that in production Oracle was using a value for NDV for my key column of 145, for development the equivalent figure was 195292. Upon doing some further investigation with the client we found that, since they had purged a significant proportion of the table they had collected new stats using their existing statistics collection method. This it turned out was to run the venerable command
ANALYZE TABLE PARENT ESTIMATE STATISTICS
This had not however resolved the issue and a colleague had then run:
DBMS_STATS.GATHER_TABLE_STATS(owner,table,estimate_percent => dbms_stats.auto_sample_size);
In other words he had followed the recommended statistics gathering policy from Oracle. I proceeded to run the command without the estimate_percent parameter (in 9.2 this forces a compute, in 10.1 and higher the default for this parameter changes). Once a compute had been done the NDV was correctly calculated and the plan returned to normal.
Accuracy of NDV calculations
To further investigate this issue I obtained an export dump of the data. I then ran various common statistics gathering commands (plus the venerable analyse) against the data. In common with Maria Colgan at the Oracle Optimizer group I define the accuracy of NDV as
1 – (|estimated_ndv – actual_ndv|)/actual_ndv
Though as we will see this accuracy measure doesn’t range as stated there from (0-1) but from -? to 1. I have not listed the column names as this is data from a real production database but the data below was gathered using the script shown below (changing the estimate_percent value as shown by the tables below).
drop table col_stats; create table col_stats( col_id number , column_name varchar2(30) , actual_ndv number , dbms_stats_ndv number , analyze_ndv number); insert into col_stats(col_id,column_name) select column_id,column_name from user_tab_columns where table_name='PARENT'; /* analyze */ analyze table PARENT ESTIMATE STATISTICS; update col_stats s set analyze_ndv = (select num_distinct from user_tab_columns c where c.column_id = s.col_id and c.column_name = s.column_name and c.table_name = 'PARENT'); exec dbms_stats.gather_table_stats(user,'PARENT',estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE); update col_stats s set dbms_stats_ndv = (select num_distinct from user_tab_columns c where c.column_id = s.col_id and c.column_name = s.column_name and c.table_name = 'PARENT'); declare l_cnt number; begin for c in (select column_id,column_name from user_tab_columns where table_name = 'PARENT') loop execute immediate 'select count(distinct '||c.column_name||') from PARENT' into l_cnt; update col_stats s set actual_ndv = l_cnt where s.col_id = c.column_id and s.column_name = c.column_name; end loop; end; / commit; select col_id , column_name , actual_ndv , dbms_stats_ndv , analyze_ndv , round(100*(1 - (ABS(dbms_stats_ndv-actual_ndv)/actual_ndv)),2) dbms_stats_accuracy , round(100*(1 - (ABS(analyze_ndv-actual_ndv)/actual_ndv)),2) analyze_accuracy from col_stats where actual_ndv != 0 order by col_id;
Column 16 doesn’t appear since it is entirely null so NDV = 0.
On our 9.2 system the results are:
| COL# | ACTUAL
NDV |
ANALYZE ACCURACY |
AUTO ACCURACY |
10% ACCURACY | 25% ACCURACY |
| 1 | 1868541 | 99.97 | 98.37 | 99.79 | 100.00 |
| 2 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 3 | 238878 | 0.06 | 0.36 | 10.23 | 25.28 |
| 4 | 26900 | 27.65 | 28.72 | 50.47 | 61.03 |
| 5 | 3 | 100.00 | 100.00 | 100.00 | 100.00 |
| 6 | 473637 | 0.09 | 0.41 | 10.27 | 25.32 |
| 7 | 300746 | 0.10 | 0.42 | 10.15 | 25.07 |
| 8 | 26900 | 27.65 | 28.72 | 50.47 | 61.03 |
| 9 | 452 | 8.41 | 24.78 | 81.19 | 94.03 |
| 10 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 11 | 73 | 43.84 | 52.05 | 78.08 | 84.93 |
| 12 | 52511 | 9.94 | 46.64 | 32.34 | 46.66 |
| 13 | 1051205 | 9.08 | 100.00 | 48.64 | 63.67 |
| 14 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 15 | 789 | 0.63 | 13.31 | 27.63 | 45.63 |
| 17 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 18 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 19 | 406 | 19.95 | 33.00 | 68.97 | 83.25 |
We can see that for several of the columns (3,6,7) the NDV calculation is only as accurate as the sample size. For others a relatively high samplesize (more than the 20% commonly quoted) is required to get reasonable accuracy.
I then repeated the test in 10.2
| COL# | ACTUAL
NDV |
ANALYZE
ACCURACY |
AUTO
ACCURACY |
10% Accuracy | 25% Accuracy |
| 1 | 1868541 | 99.87 | 100.00 | 100.00 | 99.82 |
| 2 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 3 | 238878 | 0.06 | 0.39 | 10.17 | 25.44 |
| 4 | 26900 | 31.11 | 29.19 | 50.43 | 60.87 |
| 5 | 3 | 100.00 | 100.00 | 100.00 | 100.00 |
| 6 | 473637 | 0.10 | 0.43 | 10.22 | 25.48 |
| 7 | 300746 | 0.10 | 0.44 | 10.13 | 25.19 |
| 8 | 26900 | 31.11 | 29.19 | 50.43 | 60.87 |
| 9 | 452 | 7.96 | 25.22 | 81.64 | 92.26 |
| 10 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 11 | 73 | 41.10 | 50.68 | 75.34 | 82.19 |
| 12 | 52511 | -135.58 | 46.71 | 32.19 | 46.76 |
| 13 | 1051205 | 25.59 | 100.00 | 48.62 | 63.68 |
| 14 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 15 | 789 | 0.76 | 14.58 | 24.46 | 48.67 |
| 17 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 18 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 19 | 406 | 19.21 | 33.25 | 69.46 | 86.45 |
There are 2 points to note here. First is that the DBMS_STATS accuracy is no better in 10.2.0.4 than it was in 9.2 for the columns that were already problematic. The second is that odd negative value for accuracy for column 12 when ANALYZE … ESTIMATE was run. Here what happened was that the ANALYZE command for some reason estimates over twice as many distinct values for this column as there actually are.
Finally I ran the same test on the same data in 11.2.0.2
| COL# | ACTUAL
NDV |
ANALYZE
ACCURACY |
AUTO
ACCURACY |
10% Accuracy | 25% Accuracy |
| 1 | 1868541 | 97.66 | 100.00 | 99.29 | 99.86 |
| 2 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 3 | 238878 | 0.06 | 98.42 | 10.27 | 25.33 |
| 4 | 26900 | 27.80 | 99.70 | 50.26 | 60.93 |
| 5 | 3 | 100.00 | 100.00 | 100.00 | 100.00 |
| 6 | 473637 | 0.09 | 99.95 | 10.37 | 25.37 |
| 7 | 300746 | 0.10 | 98.39 | 10.25 | 25.09 |
| 8 | 26900 | 27.80 | 99.70 | 50.26 | 60.93 |
| 9 | 452 | 9.07 | 100.00 | 82.74 | 91.37 |
| 10 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 11 | 73 | 34.25 | 100.00 | 79.45 | 83.56 |
| 12 | 52502 | -150.62 | 99.80 | 32.47 | 46.66 |
| 13 | 1050971 | 22.68 | 99.18 | 49.18 | 63.65 |
| 14 | 2 | 100.00 | 100.00 | 100.00 | 100.00 |
| 15 | 789 | 0.89 | 100.00 | 24.33 | 45.88 |
| 17 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 18 | 1 | 100.00 | 100.00 | 100.00 | 100.00 |
| 19 | 406 | 19.70 | 100.00 | 72.17 | 83.00 |
As described in the earlier link to the optimizer team’s blog 11.2 does a very, very good job of getting NDV correct. This is likely to have beneficial effects for a significant number of queries in the wild – and may go some way to reducing the need for histograms (since these historically have a side effect of making the NDV calculation much more accurate as well).
References:
http://www.dbspecialists.com/files/presentations/dbms_stats.html
This paper by Terry Sutton covers this issue for Oracle 9.2 and the then new 10.1.
This Optimizer Team paper covers 11g.
This series by Alberto D’ellera covers cardinality estimates in the presence of histograms.
Possibly Related Posts:
Instrumentation Overhead
One of the nice things about conferences such as ODTUG http://odtugkaleidoscope.com/technicalsessions.html is the networking and ideas exchange that they facilitate. One of the nice things about Twitter is that it allows people who aren’t there to catch up on these two conference aspects. Even @doug_conference who doesn’t use Twitter recognizes this. Anyway, @alexgorbachev commented on Tom Kyte’s presentation at odtug
#odtug quote of the day “overhead of instrumenting your code is negative” paraphrased by me but based on Tom Kyte
Now I wasn’t there, but I have heard Tom say that instrumentation isn’t an overhead because it is information that you require. In that sense I agree with the comment and Alex’s paraphrase. However most people consider overhead to mean something like
Overhead sometimes describes the amount of processing time the installation of a particular feature will add to the amount already required by the program. WhatIs.com
In this sense I disagree with Alex, and it was in this sense that I replied to him.
I disagree, it is positive, but only for code that doesn’t matter.
That is to say where you have uninstrumented code that already completes the required task in an adequate timeframe then instrumenting that code that will add some execution time. This is in the second sense ’overhead’. The reason I say that that code doesn’t matter is that it is code that you shouldn’t be tuning (yet) anyway. If your code achieves it’s task in its expected time frame then it is efficient enough.
The key reason I suspect that Tom said whatever it was that he did is in the, more usually encountered in financial accounting, following definition of overhead
The ongoing administrative expenses of a business which cannot be attributed to any specific business activity, but are still necessary for the business to function InvestorWords.com
In finance terms these tend to be things like rent,insurance and so on. In software terms instrumentation fits this definition beautifully. It doesn’t directly contribute to achieving any specific code outcome. it’s there so that, should it be required, then the desirable goal of tuning specific tasks can be achieved. As in a company without insurance you can operate without it and there is an extra cost to having it. When the time comes however it is invaluable. As with the financial world overheads like instrumentation should be made as efficient as possible, they shouldn’t be skipped.
