Archive for September 24th, 2004
A throwaway comment on Oracle-L to the effect that the CBO will make better choices if you use an artificial value to indicate an unknown value (in this case a current date) led me to knock together this demonstration that this is in general probably not true.The test goes as followscreate a table with a pk and 4 columns – 2 of type date and one each of numeric and character.Then populate these columns with 20000 values that represent date data for (approx) the last 5 years. In the case of the character and numeric columns we use ‘dates’ of the format YYYYMMDD so that they sort properly.
then I have updated the last 30 days worth of history for records that have not yet had a date assigned (think payment date for invoices).the first column is updated to 31st December 9999 (obviously artificial), the second to NULL the last two to ‘ ‘ and 0 respectively. I also attempted to simulate data error/disputed payment or whatever by incorrectly updating 20% of the remaining data to the same value. Finally like a good dba I collect stats.I then explain two queries against the data. The first just tries to select those records with ‘unassigned’ dates – current records. This should return 1279 records out of 20000. The second query covers (and admittedly somewhat artificial) range and should return 104 records. This gives me the following results for the expected cardinalities of the queries
| Query | Actual Cardinality | Artificial Date | Date | Artificial Char | Artificial Number |
| Current | 1279 | 11 | 1179 | 11 | 11 |
| Access Path | IDX | FTS | IDX | IDX | |
| Range Scan | 104 | 21 | 124 | 11 | 20 |
| IDX | IDX | IDX | IDX | ||
| Current – Histograms | 1279 | 1404 | 1331 | 1248 | 1248 |
| FTS | FTS | FTS | FTS | ||
| Range Scan – Histograms | 104 | 103 | 42 | 195 | 195 |
| IDX | IDX | IDX | IDX |
There are also two things to consider. The first is how does the query under test perform, but the second is how would the figures computed above affect the optimizer if the results from this query were to feed another operation in a more complex query plan. There are a number of interesting things that one can draw from this, but I’ll limit myself to the following suggestions.
the full results from my 10g system are here. If you were wondering about the last 2 columns, yes I have seen – indeed use – a third party system that uses characters and numbers to represent dates.