/egilh

Monday, February 28, 2005

566.aspx

Sick hardware on egilh.com

My server was sympathetic with me lately as it was getting sicker by the day:

Its ~~cough~~ CPU noise was loud enough to hear in a different room with two young kids in the house!

The secondary HD on my server had several bad sectors and Win2k decided to unmount it whenever it was unable to flush the write operations to the disk. I blame it on the case design as the HDs were too close. The second HD mounted directly over the first one in the case got way too hot. The HD container I bought worked like a charm so the HD still worked. Sort of. Rebooting the server each time it lost the HD was a pain so I decided to fix the HD problem first.

This weekend I scheduled a thorough chkdsk, crossed my fingers and rebooted. The HD is 80GB but less than half full so I thought 1/2 hour should be enough for the chkdsk. Boy was I wrong. One hour later I got worried and shut down the server and lugged it to the kids room where I could connect a screen and keyboard to see what was happening

(The server only has a mouse attached as I do all management via Terminal Server. Someone please tell me why Win2k refuses to work without a mouse which you cannot do anything with but boots happily without the keyboard you need to log in...)

Since I already had it handy I decided to fix the fan first. I had bought a replacement fan a few weeks earlier on PrimeStore.it where I bought the server several years ago. One neat thing is that they, like ePrice.it, offer free shipping if you can pick up the goods in one of their deposits.

I had never swapped a CPU fan before so I didn't really know what to expect. I sure didn't expect this mess when I took of the heat sink:

Who put a chewing gum on my CPU? Same color, same consistency. Yuck!

Turns out it is a feature not a bug. The documentation in the box didn't mention anything but I found a tube of white stuff hidden below the new fan. I thought it as the usual humidity absorbing stuff that is included in most boxes with electronic devices. I had a look at the fan constructors web site and discovered that the chewing gum is quite important: put on a thin layer of the silicon stuff on the CPU before replacing the heat sink. I guess it's increases conductivity.

As soon as the CPU fan shut up I could hear that the hard disk was terminally ill. During chkdsk it made nasty clicking noises when it tried to read certain areas of the disk. I managed to find a 200GB in a local shop for 100 euro and recover 99% of my data. I only “lost“ one of my CDs that I had ripped to MP3. A few minutes later and I was able to play my entire CD collection in my Kiss player again :-)

~~At least I have an idea what to use the old hard disk for~~. Change of plans: the disk seems to work OK after the latest format so it stays attached as temporary backup disk for stuff that I can afford to loose.

Friday, February 25, 2005

557.aspx

COM+ Call Timers: the unofficial way

Not satisfied with the official way, I went looking for a simpler way. The Component Services Console displays the data so the information must be there somewhere. It is a waste of time and resources me to collect the data and calculate the call times when the information I want is already available.

As it turns out the unofficial way is a lot simpler to implement than the official way.

Step 1: include the comsvc.dll library
// ComSvcs library for internal com+ method call tracking
#import "c:\windows\system32\comsvcs.dll" exclude("IAppDomainHelper")

Step 2: get the statistics

Get an instance of the internal com+ tracker: COMSVCSLib::IGetAppDataPtr ptrAppData(_T("{ecabafb9-7f19-11d2-978e-0000f8757e2a}"));
Get a list of applications and lop through them: ptrAppData->GetApps(&nAppData, &appData);
- Get the application data you are interested in: calls per second, total calls, etc
- Get a list of classes in the application and loop on them, extracting the information you want: ptrAppData->GetAppData(oneApp.m_idApp, &nClsIDs, &aClsidData);

Example
This simple routine builds a XML string the brutal way with the com+ performance information (most error handling code removed for clarity)

Get the statistics from the hidden COM+ interface

void CTracker::getStatistics(BSTR *output)

{

const unsigned long MAX_APP_DATA = 500;

const unsigned long PUSH_RATE = 1000;

unsigned long nAppData = MAX_APP_DATA;

LPSTR lpString;

LPWSTR lpwString;

COMSVCSLib::appData *aAppData;

CString csStatistics = "";

// Get an instance of the internal com+ tracker objet

COMSVCSLib::IGetAppDataPtr ptrAppData(_T("{ecabafb9-7f19-11d2-978e-0000f8757e2a}"));

csStatistics.Append("\r\n");

// Step through the list of running application

ptrAppData->GetApps(&nAppData, &aAppData);

for (unsigned long idxApp=0; idxApp < nAppData; idxApp++)

{

unsigned long nClsIDs;

COMSVCSLib::CLSIDDATA *aClsidData;

COMSVCSLib::appData oneApp = aAppData[idxApp];

csStatistics.Append(_T("\r\n"));

UnicodeToAnsi(oneApp.m_szAppGuid, &lpString);

csStatistics.AppendFormat(_T("%s\r\n"), lpString);

CoTaskMemFree(lpString);

// Application information

csStatistics.AppendFormat(_T("%ld\r\n"), oneApp.m_idApp);

csStatistics.AppendFormat(_T("%ld\r\n"), oneApp.m_dwAppProcessId);

COMSVCSLib::APPSTATISTICS appStatistics = oneApp.m_AppStatistics;

csStatistics.AppendFormat(_T("\r\n"));

csStatistics.AppendFormat(_T("\t%ld\r\n"), appStatistics.m_cCallsPerSecond);

csStatistics.AppendFormat(_T("\t%ld\r\n"), appStatistics.m_cTotalCalls);

csStatistics.AppendFormat(_T("\t%ld\r\n"), appStatistics.m_cTotalClasses);

csStatistics.AppendFormat(_T("\t%ld\r\n"), appStatistics.m_cTotalInstances);

csStatistics.AppendFormat(_T("\r\n"));

// Get class information for this application

ptrAppData->GetAppData(oneApp.m_idApp, &nClsIDs, &aClsidData);

csStatistics.AppendFormat(_T("\r\n"));

for (unsigned long idxClass = 0 ; idxClass < nClsIDs; idxClass++)

{

COMSVCSLib::CLSIDDATA oneClass = aClsidData[idxClass];

csStatistics.AppendFormat(_T("\r\n"));

// Get the progID for the guid.

// This FAILS during recyling on Win2k3

if (!FAILED((ProgIDFromCLSID(oneClass.m_clsid,&lpwString))))

{

UnicodeToAnsi(lpwString, &lpString);

csStatistics.AppendFormat(_T("\t%s\r\n"), lpString);

CoTaskMemFree(lpString);

}

// Performance information

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_cBound);

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_cInCall);

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_cPooled);

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_cReferences);

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_dwRespTime);

csStatistics.AppendFormat(_T("\t%ld\r\n"), oneClass.m_cInCall);

csStatistics.AppendFormat(_T("\r\n"));

}

CoTaskMemFree(aClsidData);

csStatistics.AppendFormat(_T("\r\n"));

}

csStatistics.Append("\r\n");

CoTaskMemFree(aAppData);

*output = csStatistics.AllocSysString();

}

That's it!
Less than a 100 lines of code to build a XML document with the call times for the running COM+ applications on the system.

This approach as several benefits over the official way:

It is a lot simpler to implement and maintain (100 lines of C++ vs 1.500)
I have found no memory leaks after 9 months in production
Less resource usage. COM+ already maintains the counters, I just get them when I need them.

Keep in mind that this is an undocumented API so it may change one day in the future (Longhorn?).

Thursday, February 24, 2005

552.aspx

COM+ Call Timers: the official way

As the title suggests, there are more than one way to get access to the COM+ call timers. This post gives and overview of how to get the call times using the official Microsoft COM+ Instrumentation APIs.

Why mess with the COM+ call timers?
I had a serious problem with components accessing an Oracle DB using OLEDB. In certain cases we queries never ended. Query timeouts are not supported in the Oracle OLEDB nor Microsoft OLEDB provider and changing the resource constraints in Oracle did not improve the situation. If the query went in tilt, it stayed in tilt -forever-. That is a bad, bad, thing in a system with hundreds of calls per second.. We tracked down the problem to a specific case: the Oracle stored procedure never returned if the stored procedure "header" was OK but the stored procedure "body" was invalid. The stored procedure call waited forever for the the stored procedure to get recompiled. In normal situation it recompiled by itself but there were cases with linked DBs etc where the Oracle stored procedure body stayed invalid until someone recompiled it by hand.

We tried asynchronous queries but they did not fix the situation. If the query was stuck, it stayed stuck and the abort method hung as well. We found one way that did work: use a separate thread for the query and kill the thread if it took to much time. The query did not timeout anymore but the memory leaks were enormous when we brutally terminated the thread.

The only fix I found was to implement COM+ recycling on steroids. Monitor the call times and shut down (recycle on Win2k3) the com+ package if the call times got too high. The stored procedure was still broken but the rest of the system continued working. No memory leak problems either as the com+ process was shut down in an orderly manner.

How to get the com+ call timers - the official way
The Component Services Console displays all sort of useful information like the number of objects in call, average call time etc. But, there is no official way to get the com+ call timers directly. You have to do the dirty work yourself and calculate the average call times.

The COM+ Spy example in the \Samples\Com\Administration\Spy in the Platform SDK shows how to get all the COM+ information you desire. Below I show the basic steps but please download the Platform SDK for a complete working example.

The COM+ internal information is made available via Loosely Coupled Events. You subscribe to one or more COM+ events and implement the corresponding interface to get notified. There are a whole set of COM+ Instrumentation Interfaces. The important ones for COM+ call tracking are:

They allow you to monitor COM+ application activation/shutdown and method call/return and provide you with all you need to calculate the call time. I used COM+ Spy as a skeleton and choose the following approach to tracking the call times:

Implement a tracking object (CMethodMon in the example below) for each com+ application you want to monitor. The class:
- Implements the IComMethodsEvent interface to track call times
- Implements the IComAppEvents interface to clear the current calls when an application shuts down
- Subscribes to the IComMethodsEvent and IComAppEvents
- Has one timer that fires every X seconds (configurable). The timer:
  - Calculates the average call time (highestCalltimes below)
  - Take appropriate action if the call times are too high (shut down component etc)

IComMethodsEvents
This is THE interface for call time tracking. It is called when a method is called or returns and when an exception occurs. You get a lot of information in the COMSVCSEVENTINFO structure like the process ID, current time stamp etc.

Keeping track is quite simple:

Create a stack for each activate object (OID stack):
Push call information on the OID stack in OnMethodCall. In my case I only care about the call time so I only push the performance counter
Pop from the OID stack in OnMethodReturn

// Stack definition (only contains the time stamp as that's all I care about in this app)

typedef map TimeMap;

imeMap m_map;

STDMETHODIMP CMethodMon::OnMethodCall (COMSVCSEVENTINFO * pInfo, ULONG64 oid, REFCLSID cid, REFIID rid, ULONG iMeth)

{

EnterCriticalSection(&m_csMapLock);

TimeStack * pStack = m_map[oid];

if (!pStack)

{

pStack = new TimeStack;

m_map[oid] = pStack;

}

pStack -> push_front(pInfo->perfCount);

LeaveCriticalSection(&m_csMapLock);

return S_OK;

}

STDMETHODIMP CMethodMon::OnMethodReturn (COMSVCSEVENTINFO * pInfo, ULONG64 oid, REFCLSID cid, REFIID rid, ULONG iMeth, HRESULT hr)

{

TimeStack * pStack = m_map[oid];

pStack -> pop_front();

// Remove the entry for the oid if it's call stack is empty

if (pStack -> empty())

{

delete pStack;

m_map.erase(oid);

}

return S_OK;

}

IComAppEvents
Only the OnAppShutdown event is of interest in the IComAppEvents:

STDMETHODIMP CMethodMon::OnAppShutdown(COMSVCSEVENTINFO * pInfo, GUID guidApp)

{

ClearCallTimes();

return S_OK;

}

void CMethodMon::ClearCallTimes()

{

EnterCriticalSection(&m_csMapLock);

// Clear all the call times

TimeMap::iterator iter;

for (iter = m_map.begin( ); iter != m_map.end( ); iter++ )

{

TimeStack *pStack = (TimeStack *) (*iter).second;

while (!pStack->empty())

{

pStack -> pop_front();

}

m_map.clear();

LeaveCriticalSection(&m_csMapLock);

}

Calculating the highest call time is straightforward:

unsigned long CMethodMon::highestCallTime(void)

{

long lHighestCallTime = 0;

LONGLONG PerformanceFrequency;

QueryPerformanceFrequency((LARGE_INTEGER *)&PerformanceFrequency);

LONGLONG lNow;

QueryPerformanceCounter((LARGE_INTEGER *) &lNow);

EnterCriticalSection(&m_csMapLock);

// Step through all the oid's currently in call

TimeMap::iterator iter;

for (iter = m_map.begin( ); iter != m_map.end( ); iter++ )

{

TimeStack *pStack = (TimeStack *) (*iter).second;

// Get the LAST element in the call stack as it will be the oldest element

if (pStack)

{

LONGLONG lOldest = pStack->back();

unsigned long lThisCallTime = (unsigned long)((1000*(lNow - lOldest))/

PerformanceFrequency);

if (lThisCallTime > lHighestCallTime)

{

lHighestCallTime = lThisCallTime;

}

LeaveCriticalSection(&m_csMapLock);

return lHighestCallTime;

}

There are known issues with the implementation above. Some error handling code has been removed and it does not take into account the fact that the PerformanceCounter will wrap sooner or later.

Issues with the official approach
The component gets called for each COM+ call (in the applications you monitor). Interesting in itself but you have to be a 110% sure you don't introduce deadlocks, memory leaks or anything else that affects the stability of the system.

I used the official approach for more than a year. It worked like a charm for weeks on end, but at random times I would get an enormous memory leak (>1GB) in the Windows RPC sub system on Win2k (I have not tried it on Win2k3). The leak may have been due to another process making heavy use of using COM+ notifications, a bug in my code or something else. It only happened when the machine was under heavy stress (100% CPU, and more than 1000 notifications per second for a few hours) so my -wild- guess is that other tasks got higher priority and the notifications continued to queue up until the system went in tilt.

The solution is not trivial to implement and requires the developer to know C++, ATL, COM admin APIs as well as Loosely Coupled Events. Finding skilled C++ COM+ programmers is hard so I decided to look for an easier to maintain solution. I will post the unofficial way I found tomorrow. It is a lot simpler and drops the source code from 1.500 lines of C++ to less than 150.

Wednesday, February 23, 2005

550.aspx

RootkitRevealer

SysInternals does it again and delivers another impressive tool: RootkitRevealer

RootkitRevealer is an advanced root kit detection utility. It runs on Windows NT
4 and higher and its output lists Registry and file system API discrepancies that
may indicate the presence of a user-mode or kernel-mode rootkit. RootkitRevealer successfully detects all persistent rootkits published at www.rootkit.com, including AFX, Vanquish and HackerDefender (note: RootkitRevealer is not intended to detect memory-based rootkits like Fu that don't survive reboots).

Via [Sysinternals]

Tuesday, February 22, 2005

543.aspx

Visual Studio 2005 Device Command Shell [v0.98]

The Device Command Shell (DCS) is a very useful tool if you do Pocket PC, aka Smart Device, development using Visual Studio 2005.

DCS is a Command Window addin that gives you command line access to the most common operations during PPC development: copy files, edit the registry, list/start/stop processes, install DLLs etc. The list of commands continues to grow with each release:
CE Batch
CE Certs
CE Config
CE Connect
CE Copy
CE CreateRegKey
CE CreateRegKeyValue
CE Del
CE DeleteRegKey
CE DeleteRegKeyValue
CE Depends
CE Get
CE GetRegKey
CE Help
CE Install
CE IP
CE List
CE MD
CE Ping
CE Query
CE RD
CE RegSvr (new in 0.98)
CE SetPolicy
CE SetProxy
CE SetRegKey
CE Start
CE Stop
CE TList

You can download the Device Command Shell and documentation from my GotDotNet workspace. Set up the aliases above by performing these steps after installing DCS:

Bring up the command window: go to the "View" menu, select "Other Windows", then "Command Window”.

Enter the following command: DeviceCommandShell.Connect.DeviceCommandShell alias

I have one wish for DCS team: I hope they create a version that works 2003 as well. This thing is just too useful to have to wait for VS2005!

Via [Visual Studio For Devices]

Monday, February 21, 2005

542.aspx

Community Server 1.0 released

Community Server 1.0 was released this weekend. The source code is not available yet but I have downloaded the binaries so I can try the various components: blog, forum and photo gallery.

I will post my experience with the .TEXT migration as soon as the migration tool/scripts for Community Server 1.0 have been released.

Via [Duncan Mackenzie]

Sunday, February 20, 2005

540.aspx

Citibank fights phishing the wrong way

Citibank must be one of the most common targets of phishing scams around. I have lost track of the fake mails I have received and forwarded to Citibank security.

The Citibank on screen keyboard described by BetaNews smells like a publicity stunt to show that they take security seriously and that they are doing something. Or are they really clueless enough to think that this online keyboard will improve security?

It is true that some basic keyboard loggers do not work with an on screen keyboard but it is lot less secure than a normal password field:

It is easer to see which password the user enters as you just have to follow the mouse on screen as it clicks the characters one by one. Discovering my password by shoulder surfing is a lot more difficult as I touch type pretty fast.

It limits which letters can be inserted. There is no Shift key so you are stuck with uppercase letters only and a very limited set of special characters. I am not paranoid enough to use AltGr to enter random characters but I do use a mix of upper case, lower case, numbers and extended European characters.

This keyboard does not prevent phishing. The JavaScript keyboard will show up on the phishing sites and the phishers will continue to get the clear text username and password like they have in the past.

You cannot type in the password field, but all is not lost as they did not disable paste functionality. Good password managers like Password Safe and the one I'm working on in my spare time, continue to work as they allow you to paste the password without ever displaying it on the screen.

I feel a lot safer with other banks that offer some sort of two factor authentication:

Password plus a random code from a pre-generated list issued by the bank

Password plus a SMS

SecurID

My credit card company sends me a free SMS alert when someone charges my card which makes me feel pretty much in control.

Click here to see the ~~folly~~ on screen keyboard at work