orawin.info

Niall's Oracle Pages – Oracle Opinion since 2004

Archive for May, 2011

Monotonically Increasing Indexes

with 5 comments

Richard Foote has put up another excellent rebuttal of the myth that you should rebuild indexes based on the value of del_lf_rows/lf_rows from an appropriately valideated index structure. Greg asked a question similar to my comment on the OTN forums in this thread. Namely

Is Oracle able to reuse index space when indexed column is like current date (monolitically increasing). So if we delete some old rows (past dates) and inserted new rows are with higer (more recent) dates . Does Oracle reuse space from deleted rows ?

The short answer to Greg is that the Radiohead example Richard uses shows that index space is reused for montonically increasing indexes where the old data is completely deleted. What happens is that the block gets put back on the freelist and is eligible for reuse as a new leaf block. This pattern of insert, process, delete old is pretty common. Sometimes however not *all* the old data gets deleted. In this case Oracle won’t re-use the free space. I have modified Richard’s example so that I delete all the non-prime values below a threshold. Due to the pattern of distribution of the primes this should leave a relatively large number of almost empty index blocks. In this case we see that the old space is not reused until the laggard’s have been finally deleted. NB if you are tempted to use my prime number test, feel free but be aware it degrades as the value of n grows. In other words I only warrant accuracy for positive numbers, not performance. In my case a significant percentage of the available space is left held by these ‘old’ entries. It’s worth remembering however that

  1. the ratio can’t tell you if you have a pattern that matches this usage and
  2. in most cases you’d want to investigate why these values weren’t being deleted anyway.
NIALL @ NIALL1 >@index_deletes
NIALL @ NIALL1 >drop table rush purge;

Table dropped.

Elapsed: 00:00:03.83
NIALL @ NIALL1 >
NIALL @ NIALL1 >create table rush(
  2  	     id number
  3  ,	     code number
  4  ,	     name  varchar2(30)
  5  );

Table created.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >
NIALL @ NIALL1 >insert into rush
  2  select rownum,mod(rownum,100),'MOVING PICTURES'
  3  FROM dual
  4  connect by level <= 1000000; 

1000000 rows created.
Elapsed: 00:00:03.18 

NIALL @ NIALL1 >commit;

Commit complete.

Elapsed: 00:00:00.03
NIALL @ NIALL1 >
NIALL @ NIALL1 >create unique index pk_rush on rush(id);

Index created.

Elapsed: 00:00:05.64
NIALL @ NIALL1 >
NIALL @ NIALL1 >alter table rush
  2  add constraint pk_rush
  3  primary key (id) using index;

Table altered.

Elapsed: 00:00:00.14
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >create or replace function is_prime(p in number) return number
  2  is
  3  	     l_limit number;
  4  	     i	     number := 3;
  5  	     retval  number := 1;  -- returns 0 if false, 1 if true
  6  begin
  7  	     l_limit := sqrt(p);
  8  	     if p = 1 then
  9  		     retval:=1;
 10  	     end if;
 11  	     if p = 2 then
 12  		     retval :=1;
 13  	     end if;
 14
 15  	     if mod(p,2) = 0 then
 16  		     retval := 0;
 17  	     end if;
 18
 19  	     while (i <= l_limit) loop
 20  		if mod(p,i) = 0 then
 21  			     retval := 0;
 22  		end if;
 23  		i := i + 2; -- (no need to test even numbers)
 24  	     end loop;
 25  	     return retval;
 26  end;
 27  /
Function created.
Elapsed: 00:00:00.00
NIALL @ NIALL1 >
NIALL @ NIALL1 >show errors
No errors.
NIALL @ NIALL1 >
NIALL @ NIALL1 >delete from rush
  2  where id < 400000
  3  and is_prime(id) = 0;

 366139 rows deleted.
Elapsed: 00:01:15.54
NIALL @ NIALL1 >
NIALL @ NIALL1 >commit;

Commit complete.

Elapsed: 00:00:00.01
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.97
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   1000000      366139        .37

1 row selected.

Elapsed: 00:00:00.03
NIALL @ NIALL1 >
NIALL @ NIALL1 >begin
  2  for i in 1000001..1400000 loop
  3  insert into rush
  4  values(i,mod(i,100),'GRACE UNDER PRESSURE');
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:47.54
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.49
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   1400000      366139        .26

1 row selected.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >
NIALL @ NIALL1 >begin
  2  for i in 1400001..2000000 loop
  3  insert into rush
  4  values(i,mod(i,100),'2112');
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:08.89
NIALL @ NIALL1 >
NIALL @ NIALL1 >analyze index pk_rush validate structure;

Index analyzed.

Elapsed: 00:00:00.63
NIALL @ NIALL1 >
NIALL @ NIALL1 >select lf_rows,del_lf_rows,round(del_lf_rows/lf_rows,2) del_pct
  2  from index_stats;

   LF_ROWS DEL_LF_ROWS    DEL_PCT
---------- ----------- ----------
   2000000      366139        .18

1 row selected.

Elapsed: 00:00:00.02
NIALL @ NIALL1 >spoo off

Possibly Related Posts:

Written by Niall Litchfield

May 23rd, 2011 at 10:02 am

Posted in Uncategorized

Tagged with , ,

When is a Health Check not a Health Check

with 3 comments

Thanks to The Human Fly via Twitter @sjaffarhussain I see that Oracle Corporation have a published note on How to Perform a Database Health Check. (Note 122669.1). I read this with some interest as this is something that I do quite frequently as part of my day job. (If you’d like to get me to take a look at one of your databases drop us a line). Here is the contents list of that note.
1. Parameter file
2. Controlfiles
3. Redolog files
4. Archiving
5. Datafiles
5.1 Autoextend
5.2 Location
6. Tablespaces
6.1 SYSTEM Tablespace
6.2 SYSAUX Tablespace (10g Release and above)
6.3 Locally vs Dictionary Managed Tablespaces
6.4 Temporary Tablespace
6.5 Tablespace Fragmentation
7. Objects
7.1 Number of Extents
7.2 Next extent
7.3 Indexes
8. AUTO vs MANUAL undo
8.1 AUTO UNDO
8.2 MANUAL UNDO
9. Memory Management
9.1 Pre Oracle 9i
9.2 Oracle 9i
9.3 Oracle 10g
9.4 Oracle 11g
10. Logging & Tracing
10.1 Alert File
10.2 Max_dump_file_size
10.3 User and core dump size parameters
10.4 Audit files
The paper introduces each of these areas – so for example the section on control files reads (in it’s entirety)
It is highly recommended to have at least two copies of the controlfile. This can be done by mirroring the controlfile, strongly recommended on different physical disks. If a controlfile is lost, due to a disk crash for example, then you can use the mirrored file to startup the database. In this way fast and easy recovery from controlfile loss is obtained.
connect as sysdba
SQL> select status, name from v$controlfile;

STATUS NAME

------- ---------------------------------

/u01/oradata/L102/control01.ctl

/u02/oradata/L102/control02.ctl

The location and the number of controlfiles can be controlled by the
‘control_files’ initialization parameter.

I’m rather a pedant so would rather the document be described as a “Best Practices” or “Guidelines “document than a how to. A How to document would reword the above as

Ensure that the database has at least 2 controlfiles located on different disks by reviewing the ‘control_files’ initialisation parameter.

I.E it told you How to do something, hence the name. But my beef isn’t really that this is yet another Best Practices document, but rather with what it leaves out. This is perhaps best illustrated by referring you to this forum thread. A health check that doesn’t ensure recoverability is no health check.

Possibly Related Posts:

Written by Niall Litchfield

May 16th, 2011 at 8:34 am

Posted in Uncategorized

Tagged with

blogs.oracle.com.

with one comment

As some of you will have noticed blogs.oracle.com has recently moved platform from Movable Type to Roller. This appears to be essentially the old blogs.sun.com platform. I do have some issues with it though – most notably the fact that the blog search now requires an oracle.com SSO account and doesn’t return results that it ought to. I’ve therefore knocked up a quick custom search engine for blogs.oracle.com that you can find at http://orawin.info/blog/search-oracle-blogs or on the local pages link on the right. Unlike Oracle I don’t intend this link to move – and if it does for some as yet unknown reason I’ll put a redirect in place.

Possibly Related Posts:

Written by Niall Litchfield

May 9th, 2011 at 9:33 am

Posted in Uncategorized

Tagged with

Random Thoughts

with one comment

I’ve been considering carefully recently the generation of randomized, representative test data for a test case I’m working on. I will blog the worked test case as well, but in the mean time I thought it worth jotting down a few ideas about generating reasonably representative volumes of test data for Oracle. My problem is that I want to model a particular growth pattern of data, namely that from a social network, and particularly I want to attempt to see how a sensible database design would scale for this network. To that end I want to model the network with a few hundred users – as one might see in a corporation, a few tens of thousands of users – as one might see when the network starts to take off, and finally with hundreds of thousands of users (I don’t have access to a suitable test infrastructure for millions of users). I then want to look at how some of the underlying queries scale. This blog offers some ideas on how to generate reasonable test data at arbitrary scales.

Modelling

A social network is defined (by Wikipedia 20 April 2011) as

A social network service essentially consists of a representation of each user (often a profile), his/her social links, and a variety of additional services

For our model, loosely based on www.digg.com as described here ), we will model 4 entities. These are described below:

Users

These are people that join the service. I have modelled these using a “users” table

Name                                      Null?    Type
----------------------------------------- -------- -----------------------
USERID                                    NOT NULL NUMBER(11)
USERNAME                                           VARCHAR2(15)
REGISTERED                                         DATE
PADDING                                            CHAR(50)

User id is a sequential number, username I can keep short and generate artificially , registered is the date the user account was created and padding is used to keep row lengths longer (to model other data being required in a real design)

Friends

These are the people that users choose to link to, the social links described in the wikipedia definition. This is modelled as shown below

Name                                      Null?    Type
----------------------------------------- -------- -----------------------
ID                                        NOT NULL NUMBER(10)
USERID                                             NUMBER(10)
USERNAME                                           VARCHAR2(15)
FRIENDID                                           NUMBER(10)
FRIENDNAME                                         VARCHAR2(15)
MUTUAL                                             NUMBER(1)
DATE_CREATED                                       DATE

ID is again a meaningless key. Userid is the user creating the link, friendid is the target of the link, mutual we don’t use, and date created the date the link was created. This table is drawn from the same link as above. It’s interesting to note that the datatypes for the id columns differ in length but for our purposes in Oracle that will not matter.

Items

This is a table whose existence I have surmised and is intended to contain all items added to our social network. For this work it represents interesting items added by users. It is defined as

Name                                      Null?    Type
----------------------------------------- -------- -------------------
ITEMID                                    NOT NULL NUMBER(11)
USERID                                             NUMBER(11)
DUGDATE                                            DATE
PADDING                                            CHAR(50)

Again item id is a meaningless primary key, userid is the id of the user who added it. Dugdate a date on which the item was ‘dug’ – we don’t use this – and padding to plump out any extra columns.

Diggs (sic)

This is a table described by digg in the blog article above and which has the following definition

Name                                      Null?    Type
----------------------------------------- -------- -----------------------
ID                                        NOT NULL NUMBER(11)
ITEMID                                             NUMBER(11)
USERID                                             NUMBER(11)
DIGDATE                                            DATE

This table again represents a user interaction, just as friends does, namely the act of “digging” an item added to the site. Again the datatypes have that odd length difference. It’s clear from the blog entry that the id columns are primary key values generated in a meaningless way.

Problem

The problem that I have is that I want to model reasonably representative generated data. For a social network however randomly generated data – using dbms_random.value or dbms_random.normal won’t be representative. We can make the following assumptions about our putative site.

This means that I want to be able to generate my registration dates so that they are distributed exponentially, my friends per user using a power law and the number of diggs per user such that it fits a poisson distribution. In general when considering how to model your systems it’s worth bearing in mind that data often does not follow normal or uniform distribution patterns. Changing the distribution may well cause behavioural changes and oddities – particularly if you are operating on a system prior to 11g that may not estimate cardinality well

Possible Solutions

Ideally of course I’d like to generate all the data in oracle at different scales , I’m going to choose 500, 5000, 50000 and 500000 users. For all but one of the problems described above there is a ready Oracle solution, for the distribution of diggs amongst users a bit of lateral thinking is required. First the easy part, if we assume that the number of friends follows a power law. We can produce a distribution of links for testing using the following (inefficient) code extract.

DECLARE
  l_usercount constant NUMBER := 500000;
  l_itemcount constant NUMBER := 5000000;
  l_username users.username%type;
  l_friendcount NUMBER;
  type friendlist IS TABLE OF pls_integer;
  l_friendlist friendlist;
  l_itemid NUMBER;
  l_userid NUMBER;
BEGIN
    -- Setup friend relationships. Power law.
    FOR i IN 1..l_usercount
    LOOP
      -- initialize array
      l_friendlist := friendlist();
      -- get number of links for this users (1-100 distributed)
      l_friendcount := POWER(100, dbms_random.value) ;

      SELECT
        username
      INTO
        l_username
      FROM
        users
      WHERE
        userid = i;
      -- add friends
      FOR j IN 1..l_friendcount LOOP
          INSERT INTO friends
(userid , username , friendid , friendname , date_created )
          SELECT
            i , l_username , userid , username , sysdate - dbms_random.normal*dbms_random.value(1,365)
          FROM
            users
          WHERE
            userid = l_friendlist(j);
EXCEPTION
        WHEN dup_val_on_index THEN
          NULL;  -- occasionally dbms_random will get the same friend
        END;
      END LOOP;
      COMMIT;
      l_friendlist := NULL;
    END LOOP;
END;
/

For the number of diggs per user and the registration dates I found by far and away the easiest solution was to pre-generate a table of values that followed these distributions using the Excel Analysis Tool-Pak http://office.microsoft.com/en-us/excel-help/about-statistical-analysis-tools-HP005203873.aspx This tool allows you to generate numerical values that follow the described distributions, and excel’s save as csv feature allows you to easily create a file that can be loaded into the database via an external table. In my test setup I called the resultant table random_values and can populate my data using the following pseudo-code. I’ll look to publish the results of my investigations in the next blog entry but 1000 words is enough for a blog, and the principle that your test data should reflect reality however that looks is independent of the application that you are testing.

BEGIN
      FOR i IN 1..l_usercount
      LOOP
        -- dig n items our table assumes 2.8 per user
        SELECT
          poisson
        INTO
          l_digs
        FROM
          random_values
        WHERE
          userid = i;
        FOR n   IN 1..l_digs
        LOOP
          l_itemid := floor(dbms_random.value(1,l_itemcount)); -- what do I dig
          INSERT
          INTO
            diggs
            (
              itemid,
              userid,
              digdate
            )
            VALUES
            (
              l_itemid,
              i,
              sysdate - (dbms_random.normal*dbms_random.value(1,365))
            );
        END LOOP;
      if mod(i,10000) = 0 then commit; end if
      END LOOP;
   commit;
    END ;

Possibly Related Posts:

  • No Related Posts found

Written by Niall Litchfield

May 3rd, 2011 at 2:33 pm