orawin.info

Niall's Oracle Pages – Oracle Opinion since 2004

Archive for December, 2010

Tracking Down Windows Memory Leaks

without comments

This is just a small windows O/S related note, covering how to track down memory usage of a specific windows service. This all happened on Windows Vista. The same steps are likely to work perfectly well on Windows 7 and both flavours of Server 2008. I’d expect them to also work on earlier versions of Windows as well.

I noticed that my laptop was consuming significant amounts of memory – well actually as this is a 4g laptop I didn’t notice this until I tried to run 2 instances of Oracle in 2 vms at the same time. Shutting everything down showed me that I still had a problem. There was an instance of svchost that was consuming the best part of a gigabyte of memory. Task Manager looked like this

svchost_memory_usage

Unfortunately the svchost process that was experiencing this issue was the instance that runs the netsvcs group of network services. On my machine the list of enabled services in that group reads

  • AeLookupSvc
  • Themes
  • IKEEXT
  • AudioSrv
  • Rasman
  • Remoteaccess
  • SENS
  • Wmi
  • wuauserv
  • BITS
  • ShellHWDetection
  • iphlpsvc
  • seclogon
  • MMCSS
  • ProfSvc
  • EapHost
  • winmgmt
  • schedule
  • browser
  • AppMgmt

Now the tasklist utility can report memory usage for a service – however quick investigation showed that it seemed to report in fact memory usage by the host process – that is memory usage for all of these services was identical. eg reporting memory usage looked like this (choosing a different group of services for reasons that will become obvious later)

C:\>tasklist /svc /FI "PID EQ 1524"

Image Name                     PID Services
========================= ======== ============================================
svchost.exe                   1524 EventSystem, FDResPub, LanmanWorkstation,
                                   netprofm, nsi, SSDPSRV, SstpSvc, W32Time,
                                   WebClient

C:\>tasklist /fi "services eq LanmanWorkstation"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
svchost.exe                   1524 Services                   0      9,892 K

C:\>tasklist /fi "services eq EventSystem"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
svchost.exe                   1524 Services                   0      9,892 K

This meant that I had no idea which service was responsible for the memory usage. In addition many of these services are necessary for windows to function correctly. Googling memory usage by netsvcs or svchost did not produce significant help, although it did reveal a startling amount of ignorance about what svchost is and does.
Fortunately the SC utility allows modifications of service definitions

C:\>sc config help
DESCRIPTION:
        Modifies a service entry in the registry and Service Database.
USAGE:
        sc  config [service name]

...

OPTIONS:
NOTE: The option name includes the equal sign.
      A space is required between the equal sign and the value.
 type= 
 start= 
 error= 
 binPath= 
 group= 
 tag= 
 depend= 
 obj= 
 DisplayName= 
 password=


C:\>

setting the type of each service to “own” as shown below (note the space after the = sign)

sc RASMAN type= own

and then restarting the PC starts each service in its own instance of svchost. After restarting this the culprit service became clear. The guilty service in my case was iphlpsvc which a helper service for IPv6 connectivity. A quick search on this problem revealed the following knowledge base article http://support.microsoft.com/kb/983457 and associated hotfix. In fact the hotfix is only available for the current Vista Service pack level – which is SP2 at the time of writing – and as I don’t require ipV6 networks I have just disabled the relevant service.

In summary then
Use of the filter operators for the command line utility tasklist can help in diagnosing some issues, but the output may not be accurate for services hosted in a shared svchost process. The sc utility can be used to modify service behaviour as well as to control add and delete services.

Possibly Related Posts:

Written by Niall Litchfield

December 24th, 2010 at 9:54 am

I don’t think there’s a punch-line scheduled, is there?

without comments

I’ve been fighting a really bizarre issue with Oracle Warehouse Builder 11.1.0.7 this last month or so. It looks like it finally got resolved today. The resolution has implications for other people so I’m putting it up here, partly for them and partly so I don’t forget the oddity again.

The warehouse Builder architecture is covered in the docs here but the relevant diagram I reproduce below for the purposes of this blog article.

This graphic is described in the surrounding text.

OWB has a repository containing metadata, plus source and target deployment locations. In addition a java application known variously as the OWWB runtime, the control centre service, rtp etc runs on the database server o/s. The database server keeps track of the status of this service using some scheduled jobs. In addition Oracle provide some scripts in $ORACLE_HOME/owb/rtp/sql to start/stop and diagnose this service.

Our OWB service would not start – and did not produce error logs as expected. The service_doctor script reported:

SQL> @service_doctor

Role set.

All PL/SQL packages and functions are valid
Platform properties have been loaded correctly
Platform location has been seeded correctly
NLS messages have been loaded correctly
>>>>>> The platform service is not available
Service script is accessible to the database server
Connection information stored within the repository is correct

Whilst trying to start the service hangs for 3 minutes ( a timeout in the script I believe) and then fails

SQL> @start_service

Role set.

Not Available
Diagnostics:
started service using command "ORACLE_HOME/owb/bin/unix/ru
n_service.sh -automatic 1 ORACLE_HOME OWBSYS HOSTNAME PORT SERVICE_NAME"

PL/SQL procedure successfully completed.

Running the owbcollect script to collect diagnostic details was unenlightening – it indicated that all was well, apart from some invalid objects in the target schemas, and that the jobs had been created. At this point my analyst was running out of hair, there were no logfiles to investigate, the owbcollect indicated all was well with the world – and looked the same as another instance that was running fine. We rechecked JOB_QUEUE_PROCESSES and AQ_TM_PROCESSES and recreated the advanced queues used by the service for good measure.
It was at this point I remembered problems we had had elsewhere with enterprise manager grid control. Under upgrade circumstances EM GC can fail to create partitions for uploaded metrics to be stored in, this happens when the dbms_jobs that create the partitions are either missing or disabled. The jobs weren’t missing in this case, but they sure weren’t being executed. It turns out that as of 10g there are now 3 ways to disable dbms_job execution.
1) set JOB_QUEUE_PROCESSES
2) for AQ jobs set AQ_TM_PROCESSES
3) Disable DBMS_SCHEDULER.

Item 3 had hit us. Because the new and old job queue systems use the same co-ordinator disabling the new system also disables the old one. So if you are facing a situation where scheduled jobs aren’t running and all the old fashioned parameters look OK take a look at DBA_SCHEDULER_GLOBAL_ATTRIBUTE -0 if the scheduler is disabled there will be a row in that view indicating it.

Possibly Related Posts:

  • No Related Posts found

Written by Niall Litchfield

December 22nd, 2010 at 3:49 pm

Posted in Uncategorized

Tagged with ,

Yodel-ay

with 5 comments

Online, service matters. In common with many households we have been receiving parcels from online stores over the last little while. Today, unfortunately, we were all out when the parcel delivery from Yodel arrived. I wasn’t familiar with Yodel, but it appears it is one of those rebrandings that corporations decide are a good idea from time to time. I can only assume that the Home Delivery Network decided it’s name was too descriptive and conventional. Or possibly publicity like this called for a change of name. Anyway Yodel claim

Welcome to a new way of thinking. At Yodel we believe there’s a better way. A way that gives greater convenience for your customers and tailored, more efficient services for businesses. Yodel has the scale and capacity to make it happen. It’s your delivery, so it should be your call how it works. Your Yodel experience will be, quite simply, the widest choice and an unrivalled service delivered with a can-do attitude. And a smile

Well they certainly delivered on the first sentence today, if sadly not the rest. Here is the card left for us.

over the fence round the side.

In case you can’t read that, it reads

Over fence, side of house – 2 parcels.

Creative thinking indeed. You can see it now. What do I do? They’re out, its parcels at Christmas Time. I better chuck em over the 6 foot fence. Inspired.

Possibly Related Posts:

  • No Related Posts found

Written by Niall Litchfield

December 21st, 2010 at 9:32 pm

Posted in Uncategorized

Tagged with ,

Recovery Catalog Views

without comments

I’ve recently run into an issue where the recovery catalog views (RC_xxx) in an 11.1.0.7 catalog may contain inaccurate information. We have a client with multiple databases all of which are backed up using RMAN. Rather than reading the logfile of each and every backup, each and every day I wrote a small script to query the recovery catalog to obtain the latest backup date (from RC_BACKUP_SET)  for each database. The script, if anyone wishes to ‘borrow’ it is

select
   distinct -- because sometimes autobackup gets same time
   db_name
,  last_backup
from
(select
   db.name db_name
,  to_char(bs.completion_time,'DD-MON-YYYY HH24:MI') completion_time
,  to_char(max(bs.completion_time) over (partition by bs.db_key),'DD-MON-YYYY HH24:MI') last_backup
from
  rc_database db
, rc_backup_set bs
where bs.db_key = db.db_key
order by 1,2
)
where
completion_time = last_backup;

I was reasonably pleased that this worked for the first few databases, but then puzzled as to why it was apparently showing some databases backups never succeeding when the logs showed that they have. We eventually discovered that the RC_BACKUP_SET view does not show all backup sets for 11.1.0.7 databases, at least for us, unless a list backup command is run (the output below shows this happening for us). Our databases where this happens are also 11.1.0.7 (same as the catalog) and are in noarchivelog mode. This latter fact may be relevant.

select db_key from rc_database
where name = ‘DB_NAME’;

DB_KEY
———-
60900

select max(completion_time) from rc_backup_set
where db_key=60900;

MAX(COMPL
———
16-DEC-10

SQL> host
[oracle@HOSTNAME ~]$ export ORACLE_SID=DB_NAME

[oracle@HOSTNAME ~]$ . oraenv
ORACLE_SID = [DB_NAME] ?

The Oracle base for ORACLE_HOME=/apps/oracle/product/11.1.0/db_1 is /apps/oracle

[oracle@HOSTNAME ~]$ $ORACLE_HOME/bin/rman target / catalog rman_username/rman_pwd@catalog

Recovery Manager: Release 11.1.0.7.0 – Production on Mon Dec 20 10:15:19 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: DB_NAME (DBID=3038288275)
connected to recovery catalog database

RMAN> list backup summary;

?

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
——- — – – ———– ————— ——- ——- ———- —
245768 B F A DISK 19-DEC-10 1 1 YES TAG20101219T230053
245769 B F A DISK 19-DEC-10 1 1 NO TAG20101219T231009

RMAN> exit

Recovery Manager complete.

[oracle@HOSTNAME ~]$ exit

exit

SQL>;

1 select max(completion_time) from rc_backup_set
2* where db_key=60900

SQL> /

MAX(COMPL
———
19-DEC-10

SQL>

Important Update:
It's not an Oracle problem, but a dba scheduling the wrong backup script. One that doesn't connect to the catalog!

Possibly Related Posts:

Written by Niall Litchfield

December 20th, 2010 at 10:35 am

Posted in Uncategorized

Tagged with , ,

Geographical Revisionism

with one comment

Along with many others oracle.com is appearing in Chinese. This would appear to be because of a terrible accident which has destroyed Europe and the Middle East. I can still choose my territory but this is the result

Oracle.com Regions

Ho Hum…

Possibly Related Posts:

  • No Related Posts found

Written by Niall Litchfield

December 15th, 2010 at 3:27 pm

Posted in Uncategorized

Tagged with

Meaningless Support

with 3 comments

Grrrrgh. I can sort of cope with the oracle.com technet homepage randomly appearing in Chinese. That’s just annoying. On the other hand I really do want http://support.oracle.com to actually work reliably and sensibly. Once again it didn’t for me today. Specifically I was trying to update an SR – I entered the SR update text, pressed ” Send”  (why not Update?) and was greeted with a non-modal and yet non ignorable error dialog

Unable to perform “Update SR” operation because of the following error: “BusinessException thrown from Update Step : updateSR() at 2010-12-13 07:34:21.719 CST”.

I of course can do nothing about this. Which in turn means there is no point showing me the message in this form. I did retry the submit and got the same result. At this point I logged into http://supporthtml.oracle.com (yes I needed another round through the Single Sign-ON loop – Oracle seem to think SSO means Same Sign On) pasted the update text into my SR and got the SR updated.

All of this got me thinking about Jakob Nielson’s rules of thumb for user interfaces . I’m afraid it appears the designers at Oracle Support haven’t read them, or else don’t consider them useful. Let’s run through them.

Visibility of system status
The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.

Well here I suppose you could argue that Oracle is at least giving me feedback – although it isn’t actually telling me what is going on – I can at least conclude that my SR never got updated.

Match between system and the real world
The system should speak the users’ language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.

This is just a straight fail. The language is meaningless and certainly doesn’t match users usage: In addition when this occurs the error message appears in a non-intuitive place out of context of the user action.

User control and freedom
Users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

Another fail. In general MOS neither supports undo/redo nor even normal navigation keys – eg backspace. .

Consistency and standards
Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.

Note that I said “in general above” Sometimes MOS supports the browser standard navigation keys. Sometimes it doesn’t.

Error prevention
Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.

I think this one is actually somewhat unrealistic – I don’t like the error message, but I do recognize that unexpected errors occur.

Recognition rather than recall
Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.

Hmmm. Another fail. When logging an SR without a configuration (as many of our clients require) then the same information is often asked for repeatedly – sometimes more than once (eg version of software, version of database, version of product) on the same page.

Flexibility and efficiency of use
Accelerators — unseen by the novice user — may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.

Non-existent here.

Aesthetic and minimalist design
Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.

Well it’s certainly true that my dialogue was minimalist…

Help users recognize, diagnose, and recover from errors
Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

Not even a “please retry the update” and an error message full of codes. Another Fail.

Help and documentation
Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.

 I’ll pass on this one since it’s been a while since I investigated the flash MOS help.

Overall no better than 50% Oracle. Not good enough. Not by a long way. Worth noting as well that the http://supporthtml.oracle.com interface to the same application would probably score at least 75%.

Possibly Related Posts:

  • No Related Posts found

Written by Niall Litchfield

December 13th, 2010 at 1:58 pm

Posted in Uncategorized

Tagged with

UKOUG Presentation

without comments

I had a recent email enquiry after the slides from my talk at the technically excellent UKOUG Conference since problems with the website are preventing downloads currently. Reproduced below by way of a heads up is my email explaining that the slides are now available via the web, including to non-UKOUG members. Please feel free to take a gander, but remember that this was 72 slides plus 2 demos in 45 minutes complete with me talking. The slides therefore try to illustrate what I was saying rather than reproduce what I was saying on screen. For those who were there I’d love feedback on this presentation style which is significantly hard work to prepare, but I feel probably gives a better presentation experience to attendees. (Obviously this was my first attempt so don’t expect TED or Connor McDonald standard). I’ll probably use slideshare more in the future unless people hate it.

Thanks for the kind words. I’ve uploaded it to slideshare at http://www.slideshare.net/nlitchfield/oracle-on-windows whether it will make any sense I’m not so sure since it was more of a talk around the issues whilst rapidly advancing through 72 slides and 2 brief demos in 45min thing. I rather suspect given your background it won’t tell you too much you don’t know either. Core points I wanted to get across.

1) Lots of people run Oracle on Windows and its a core product for Oracle.
2) The thread architecture severely hampers scalability on 32 bit windows in predictable ways (even though it was a sensible choice in 1993).
3) There are things you can do to work around 2, but really just run 64bit and have done with it.
4) NTFS is a pretty damn good file system and you might even consider compressing datafiles in a test/dev environment.

anyway hope you enjoy.

Possibly Related Posts:

Written by Niall Litchfield

December 6th, 2010 at 6:06 pm

Posted in Uncategorized

Tagged with ,