Debugging trick for .NET Windows Service


I was recently reading a blog post about debugging windows service. I don’t want to duplicate the same, IMO it was a very good post. One of things the post had mentioned about updating the registry key “ServicesPipeTimeout” and to increase timeout ,so that SCM does not kill the debugger and service. The one issue with changing this registry key is ,the system has to be rebooted for this to take effect and, this change is applicable across for all services. To restart a server in a production environment is never easy, because of other applications.

The same problem that was mentioned in the post ,can be dealt as Post-Mortem debugging. What I mean by that is ,taking a memory dump when there is an unhandled exception in the process. This can be automated by using a config file for cdb/windbg.exe .   In this config file we write actions for events, like on the first chance of .NET Exception log the call stack , on the second chance generate a full memory dump. The debugger commands are pretty arcane and there isn’t editor with Intellisense, so what is the solution to this.

Adplus is the answer. Adplus.vbs is a vb script file (until the recent windows debugger release) that generates this necessary config file , which is much easier to edit than to create. So here is the command to generate memory dump on Unhandled .NET Exception

sxe -c2 @".echo ---;.echo --- 2nd chance NET_CLR exception ----;.echo ---------------------------------------------------------------;.echo;.echo Occurrence happened at: ;.time;.echo;.echo Faulting stack below ---;~#kvn250;.echo;.dump -u /ma /c 2nd_chance_NET_CLR_exception_in_app_running_on_server D:\debuggers\app_running_on_server_2nd_chance_NET_CLR__full.dmp;!elog_str ADPlus detected a 2nd chance NET_CLR exception in process app_running_on_server and has taken the following actions at the time of the crash: Log Time Stack FullDump EventLog. The output directory is D:\debuggers\Debug-Crash; GN" clr

So we can still trouble shoot windows service start-up without changing the registry entry and not getting terminated by SCM. So here is my cdb config file that I use for debugging. Like I said I did not hand create it , rather I update the one which the adplus created. This script will start a remote server for the debugger clients to connect , similar to the one that was mentioned in the blog post, loads sos when clr gets loaded and few more things. This is script nothing but a simple automation for mundane things inside the debugger.
The config file has to be passed a parameter to the cdb.exe . Here is the sample gflags debugger text that I use

d:\Program Files\Debugging Tools for Windows (x64)\cdb.exe -cf  d:\Tools\debug.cfg

And a usual mistake people do within the script it forgetting the escape character and here is example how to use one

sxe -c “.load c:\\Windows\\Microsoft.NET\\Framework64\\v2.0.50727\\SOS;GN ” ld:mscorjit

Exploring undocumented SOS function in Windbg – .NET 4.0


With the new Microsoft .NET Framework 4.0 ,comes a new sos , which adds bunch of features. In this post I will be exploring one of the undocumented function , which is !HandleCLRN. Brian has also mentioned about these undocumented functions.

I guessed it by the name ,it would have to be something about clr notification exception. I knew these exceptions are raised when there is an assembly load.  Here is the simple console application that I am going to be using for exploring this.

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication1 {
class Program {
static void Main(string[] args) {
Console.WriteLine("Debugging");
Console.Read();
}
}
}

Started the exe within windbg. Issued the command

sxe ld:clrjit

To load the sos after clr is loaded. Then loaded sos by

.loadby sos clr

Note  there is change in the syntax for loading sos in .NET 4.0. Prior to this, clr was part of mscorwks.dll and now it is clr.dll.

Next step is to have a break-point on the Main Method.

!bpmd ConsoleApplication1.exe ConsoleApplication1.Program.Main

And here was the output

0:000> !bpmd ConsoleApplication1.exe ConsoleApplication1.Program.Main

Found 1 methods in module 009b2e8c…

MethodDesc = 009b388c

Adding pending breakpoints…

My assumption was ,sos would have signup with debugger for an exception, for the above command , and I was correct and here is my output from sx command

0:000> sx

ct – Create thread – ignore

et – Exit thread – ignore

cpr – Create process – ignore

epr – Exit process – break

ld – Load module – break

(only break for clrjit)

ud – Unload module – ignore

ser – System error – ignore

ibp – Initial breakpoint – break

iml – Initial module load – ignore

out – Debuggee output – output

av – Access violation – break – not handled

asrt – Assertion failure – break – not handled

aph – Application hang – break – not handled

bpe – Break instruction exception – break

bpec – Break instruction exception continue – handled

eh – C++ EH exception – second-chance break – not handled

clr – CLR exception – break – handled

Command: “”

Second command: “”

clrn – CLR notification exception – break – handled

Command: “!HandleCLRN”

cce – Control-Break exception – break

cc – Control-Break exception continue – handled

cce – Control-C exception – break

cc – Control-C exception continue – handled

dm – Data misaligned – break – not handled

dbce – Debugger command exception – ignore – handled

gp – Guard page violation – break – not handled

ii – Illegal instruction – second-chance break – not handled

ip – In-page I/O error – break – not handled

dz – Integer divide-by-zero – break – not handled

iov – Integer overflow – break – not handled

ch – Invalid handle – break

hc – Invalid handle continue – not handled

lsq – Invalid lock sequence – break – not handled

isc – Invalid system call – break – not handled

3c – Port disconnected – second-chance break – not handled

svh – Service hang – break – not handled

sse – Single step exception – break

ssec – Single step exception continue – handled

sbo – Stack buffer overflow – break – not handled

sov – Stack overflow – break – not handled

vs – Verifier stop – break – not handled

vcpp – Visual C++ exception – ignore – handled

wkd – Wake debugger – break – not handled

wob – WOW64 breakpoint – break – handled

wos – WOW64 single step exception – break – handled

* – Other exception – second-chance break – not handled

Notice that for clrn exception , there is the new command called!HandleCLRN

Changed the sxe for clrn to kb, to get the call stack

sxe -c "kb" clrn

And here is the output from the clrn exception

0:000> g

(5f5b8.5f5bc): CLR notification exception – code e0444143 (first chance)

RetAddr           : Args to Child                                                           : Call Site

000007fe`e860da53 : 00000000`01ed4c60 000007fe`e81940a9 00000000`001be560 000007fe`e85ea423 : KERNELBASE!RaiseException+0×39

000007fe`e84f08e3 : 00000000`00000000 000007fe`e85fb772 00000000`001be560 00000000`001be7b9 : clr!DACNotifyExceptionHelper+0×43

000007fe`e84f1d2b : 000007ff`000440e0 00000000`004ae620 00000000`001be948 000007ff`00000000 : clr!DACNotify::DoJITNotification+0x2b

000007fe`e824e3ce : 000007ff`000440e0 000007ff`00160120 000007ff`000440e0 00000000`00000001 : clr!DACNotifyCompilationFinished+0x6d

000007fe`e824e164 : 000007ff`000440e0 00000000`00000001 00000000`00000000 00000000`001be800 : clr!MethodDesc::MakeJitWorker+0×256

000007fe`e81cdaf3 : 000007ff`000440e0 00000000`00000000 00000000`00000000 00000000`00000000 : clr!MethodDesc::DoPrestub+0×522

000007fe`e8192547 : 00000000`00000000 00000000`004ae620 00000000`001be948 00000000`001beb20 : clr!PreStubWorker+0x1df

000007fe`e81ccc74 : 00000000`021c2258 00000000`001bebb0 00000000`001bea10 000007fe`e8194ab5 : clr!ThePreStubAMD64+0×87

000007fe`e81ccd89 : 000007ff`000440e0 00000000`00000001 00000000`00000000 00000000`00000000 : clr!CallDescrWorker+0×84

000007fe`e81cce05 : 00000000`001beb18 00000000`00000001 00000000`001beb20 00000000`001bed00 : clr!CallDescrWorkerWithHandler+0xa9

000007fe`e8352820 : 00000000`00000000 00000000`001bed08 00000000`00000001 000007fe`e6f23502 : clr!MethodDesc::CallDescr+0x2a1

000007fe`e8352958 : 00000000`004ae620 00000000`004ae620 00000000`00000000 00000000`00000000 : clr!ClassLoader::RunMain+0×228

000007fe`e8352e0a : 00000000`001bf500 00000000`00000200 00000000`01ebd660 00000000`00000200 : clr!Assembly::ExecuteMainMethod+0xac

000007fe`e8353036 : 00000000`00000000 00000000`002b0000 00000000`00000000 00000000`00000000 : clr!SystemDomain::ExecuteMainMethod+0×452

000007fe`e8352f53 : 00000000`002b0000 00000000`00000000 00000000`00000000 00000000`00000000 : clr!ExecuteEXE+0×43

000007fe`e82cf7c1 : 00000000`004ae620 ffffffff`ffffffff 00000000`00000000 00000000`00000000 : clr!CorExeMainInternal+0xc4

000007fe`f17c4989 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`001bf818 : clr!CorExeMain+0×15

000007fe`fb675b21 : 000007fe`e82cf7ac 000007fe`f17c4940 00000000`00000000 00000000`00000000 : mscoreei!CorExeMain+0×41

00000000`7712f56d : 000007fe`f17c0000 00000000`00000000 00000000`00000000 00000000`00000000 : MSCOREE!CorExeMain_Exported+0×57

00000000`77363281 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0xd

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

KERNELBASE!RaiseException+0×39:

000007fe`fd5faa7d 4881c4c8000000  add     rsp,0C8h

Got it ,the !HandleCLRN is used by sos to set break-points on methods that have not yet being jitted.

sos would have to hook with clr for getting JIT notification. Without this, sos will not be able to set break-points on function names using !bpmd ,that have not yet jitted. When clr jits the code, sos is notified because sos has signed for jit callbacks using !HandleCLRN ,after which sos will be able to set “bp”. This is evident from the above call stack.

So my next question was so how sos handling prior to !HandleCLRN. Tried the same in .NET 3.5  and the undocumented parameter it was using is   “!bpmd -notification”.

Get function return values and profiling information in .NET using Windbg


There have been times when we would like to the know function return value ,debug within the function and get profiling information like the duration of a function. VS.NET provides all these features, but in production environment we won’t have VS.NET. This is where Windbg comes in handy.  Here is the sample code that I am going to use.

using System;
using System.Collections.Generic;
using System.Threading;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1(){
InitializeComponent();
}
private void Button1Click(object sender, EventArgs e){
Func<string> x = () => “Lambda “;
var result = x();
Console.WriteLine(result);
}
private void Button2Click(object sender, EventArgs e){
Console.WriteLine(longRunningMethod() );
}
string Test(){
return “Cool trick ” + Test1();
}
string Test1(){
return “foo”;
}
string longRunningMethod() {
Thread.Sleep(2000);
return “slowone”;
}
}
}

So the first step is to figure out what would be value of “result” within the Button1Click. To do this I would have to get the return value of “x” lambda.

Launched the exe , attached to Windbg. Loaded SOS. Look for methodtable

0:004> .loadby sos mscorwks
0:004> !dumpheap -type Form1
Address               MT     Size
0000000002861fa8 000007ff00026af0      472
total 1 objects
Statistics:
MT    Count    TotalSize Class Name
000007ff00026af0        1          472 WindowsFormsApplication1.Form1
Total 1 objects

Look for the method Button1

0:004> .shell -ci “!dumpmt -md 000007ff00026af0 ” findstr Button1
000007ff0002c050 000007ff00026a18      JIT WindowsFormsApplication1.Form1.Button1Click(System.Object, System.EventArgs)
000007ff00170a10 000007ff00026a68      JIT WindowsFormsApplication1.Form1.<Button1Click>b__0()
.shell: Process exited

The above result gave address for “Button1Click” as well as the lambda that we have inside the Button1Click called “b__0”. But in real-world we could probably have more than one lambda within a function, and it is hard to guess which one we should debug.Here are the steps to verify and extract name of the lambda. Lets disassemble the Button1Click by looking at it IL. So to get the IL we would need the MethodDesc which is  “000007ff00026a18 “ this information available from our previous command.

0:004> !dumpil 000007ff00026a18
ilAddr = 000000000136228c
IL_0000: nop
IL_0001: ldsfld WindowsFormsApplication1.Form1::CS$<>9__CachedAnonymousMethodDelegate1
IL_0006: brtrue.s IL_001b
IL_0008: ldnull
IL_0009: ldftn WindowsFormsApplication1.Form1::<Button1Click>b__0
IL_000f: newobj class [System.Core]System.Func`1<string>::.ctor
IL_0014: stsfld WindowsFormsApplication1.Form1::CS$<>9__CachedAnonymousMethodDelegate1
IL_0019: br.s IL_001b
IL_001b: ldsfld WindowsFormsApplication1.Form1::CS$<>9__CachedAnonymousMethodDelegate1
IL_0020: stloc.0
IL_0021: ldloc.0
IL_0022: callvirt class [System.Core]System.Func`1<string>::Invoke
IL_0027: ldarg.0
IL_0028: call WindowsFormsApplication1.Form1::Test
IL_002d: call System.String::Concat
IL_0032: stloc.1
IL_0033: ldloc.1
IL_0034: call System.Console::WriteLine
IL_0039: nop
IL_003a: ret

From the IL we can figure out the method name of the lambda is “b__0”. Now that we have the name, we will put a break-point on the function b__0 to get the return value

bp /1 000007ff00170a10 ” bp poi(@rsp) \” du @rax+10;g \”; g “

In the above break-point I have used to “/1” to create a one-shot break-point. This means create the BP, and once when it hits remove the BP.   The reason for one-shot Break-Point is we are not interested in the start of the function, we are interested only on the return.

Here is the trick ,so when the initial Break-Point hits on this address “000007ff00170a10” it executes what ever is there within quotes. And the command I have used ,is to create another break-point. So I am creating a dynamic break-point “bp poi(@rsp)”. The function return address is stored in the @rsp register for x64 and @esp for x86. So with @rsp having the address of the function return, I have managed to set bp with on lambda function return.

The next step is to get the return value from the function. The function return value is stored in the address register rax in x64 and eax in x86 . The “du @rax+10” du command is for dumping string at the register @rax and “+10” is the actual location of the string inside the managed object.

So we have managed to get the output of the lambda function

0:004> g
00000000`028a9a38  “Lambda “

Using the same trick we can even profile methods, like getting function execution time within Windbg. Here I am going to profile the method “longRunningMethod”

0:004> .shell -ci “!dumpmt -md 000007ff00026af0 ” findstr long
000007ff0002c070 000007ff00026a58      JIT WindowsFormsApplication1.Form1.longRunningMethod()
.shell: Process exited

I get the function address in the above command. The next step is to set the Break-Point.

bp  000007ff0002c070 “.printf \” longrunningmethod \”; .echotime ;bp poi(@rsp) \”  .echotime;g \”; g “

In the above command within function start break-point, I use the command .printf to print the “longrunningmethod” and then .echotime to print out the start time of the function, and like I did before I am dynamically creating another break-point (bp poi(@rsp)),at the return of function ,on which I am also printing out the system time using .echotime.

0:004> g
longrunningmethod Debugger (not debuggee) time: Sun Feb 21 16:47:35.117 2010 (GMT-5)
Debugger (not debuggee) time: Sun Feb 21 16:47:37.122 2010 (GMT-5)

So in the above output the function duration is about 2 seconds as we expected it to be (in the code i have sleep for 2000 milliseconds). This is almost like having log statement on the function start and the function end.

Identifying High CPU in GC (.NET) because of LOH – using Windbg


I am sure most of us are aware that one of the common reasons for High CPU usage .NET is because of, percentage time spent on GC is high. There are lot of write up about this. Tess has amazing blog post  on this specifically, which explains in detail how to identify the symptoms. But one thing that I want share was the experience I had ,where in i could identify the real call stack which was causing allocation on LOH by have a break-point on the CLR Garbage collector itself. I am going to assume that you are aware of CLR GC and LOH and basic debugging using windbg.

FYI the Perfmon counter like “% Time spent in GC”, “Large Object Heap size” can identify we have an issue with allocation in LOH. But this only indicates there is a problem in High allocations that’s causing  GC to work hard and in turn causing high CPU usage . But does not point where the exact problem is .If there is a large code base and all of sudden if there is an issue like this it is hard to pinpoint where the root cause of the problem.

I am going to walk through an example on a button click the code allocates objects on LOH and identify it using windbg.

using System;
usingSystem.Collections.Generic;
usingSystem.Linq;
usingSystem.Windows.Forms;

namespaceWindowsFormsApplication1
{
public partial classForm1: Form
{
publicList<byte[]> buffer = newList<byte[]>();

publicForm1()
{
InitializeComponent();
}

private voidButton1Click(objectsender, EventArgs e)
{
AllocateinLOH();
}

voidAllocateinLOH()
{
var b = new byte[85001];
b.ToList().ForEach(
(x) => x = new byte());

buffer.Add(new byte[85001]);
}

}
}

In the above code AllocateinLOH will cause the heap allocations to LOH because the size is more than 85000 bytes. The next step is to launch the application and attach it to the debugger. To figure out the cause I could have got multiple memory dumps  and checked at every call stack when GC was happening . That is the harder way and we could be spending lot of time on that.

To get to solve the problem i always like to approach from the bottom of the stack. I was certain there should be a function within the CLR / GC which should be doing this and I was certain it has to be within MSCORWKS.dll. The next step was examine symbols using “x” command and here is the output

0:000> x mscorwks!wks::gc_heap::*

000007fe`e90d1340 mscorwks!WKS::gc_heap::alloc_allocated = <no type information>
000007fe`e89fd380 mscorwks!WKS::gc_heap::limit_from_size = <no type information>
000007fe`e8907e20 mscorwks!WKS::gc_heap::fix_older_allocation_area = <no type information>
000007fe`e90d1310 mscorwks!WKS::gc_heap::max_free_space_items = <no type information>
000007fe`e90cba38 mscorwks!WKS::gc_heap::gc_low = <no type information>
000007fe`e8dae150 mscorwks!WKS::gc_heap::grow_brick_card_tables = <no type information>
000007fe`e8a12d40 mscorwks!WKS::gc_heap::fix_generation_bounds = <no type information>
000007fe`e8a13040 mscorwks!WKS::gc_heap::fix_large_allocation_area = <no type information>
000007fe`e8a2c680 mscorwks!WKS::gc_heap::allocate_large_object = <no type information>
000007fe`e90c0478 mscorwks!WKS::gc_heap::slow = <no type information>
000007fe`e8ed8ec0 mscorwks!WKS::gc_heap::c_promote_callback = <no type information>

The above results are only partial because it was not necessary to get all the methods. The command “x mscorwks!wks::gc_heap::*” causes the debugger to get all the functions that are within the mscorwks dll with class name gc_heap. I knew it is gc_heap because prior to this I issued the command “x mscorwks!*gc*”. Because we don’t have private symbols for mscorwks we cannot see the type information. But we don’t need that for our purpose. The public symbols has FPO information for us to have break-point.

Form the above result the method name should be “allocate_large_object”

The next step was to put a break-point on the method

bp mscorwks!wks::gc_heap::allocate_large_object “!CLRStack”

And then click button and here is the callstack output

0:004> g
OS Thread Id: 0xf6f4 (0)
Child-SP         RetAddr          Call Site
000000000027e000 000007ff00190684 WindowsFormsApplication1.Form1.AllocateinLOH()
000000000027e060 000007feeab00555 WindowsFormsApplication1.Form1.Button1Click(System.Object, System.EventArgs)
000000000027e090 000007feeb1f5873 System.Windows.Forms.Control.OnClick(System.EventArgs)
000000000027e0d0 000007feeb1aadf7 System.Windows.Forms.Button.OnMouseUp(System.Windows.Forms.MouseEventArgs)
000000000027e130 000007feeb702b7d System.Windows.Forms.Control.WmMouseUp(System.Windows.Forms.Message ByRef, System.Windows.Forms.MouseButtons, Int32)
000000000027e200 000007feeab49b5a System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)
000000000027e3b0 000007feeab49954 System.Windows.Forms.ButtonBase.WndProc(System.Windows.Forms.Message ByRef)
000000000027e430 000007feeab53aa6 System.Windows.Forms.Button.WndProc(System.Windows.Forms.Message ByRef)
000000000027e460 000007feeab53945 System.Windows.Forms.Control+ControlNativeWindow.WndProc(System.Windows.Forms.Message ByRef)
000000000027e4b0 000007feeab52244 System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
000000000027e560 000007fee8aab07a DomainBoundILStubClass.IL_STUB(Int64, Int32, Int64, Int64)
000000000027e810 000007feeab6e883 DomainBoundILStubClass.IL_STUB(MSG ByRef)
000000000027e990 000007feeab6e0f8 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.

UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
000000000027ebe0 000007feeab6db65 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
000000000027ed30 000007ff00190171 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
000000000027ed90 000007fee8aad502 WindowsFormsApplication1.Program.Main()
mscorwks!WKS::gc_heap::allocate_large_object:
000007fe`e8a2c680 4053            push    rbx

Bingo now at the bottom of the stack we can see the “allocate_large_object” and on the top of the stack it is the managed code that we wrote “AllocateinLOH”.  Now we have solved the reason behind the High CPU usage in GC because of LOH.

The “allocate_large_object” is not documented by Microsoft as a public API and I don’t know whether the name would be same going forward from .NET framework 4.0.  This holds good until .NET framework 3.5. The idea behind this is just digging in to the framework can give us some information which has saved us valuable time and effort.

Function hit count using Pseudo-Register in Windbg


What if we want to know the number of times a function was invoked. We can have “.echo” or “.printf” on break-point of a function and count the output manually. The better way to do this is using pseudo-registers.

In my previous post I had mentioned about alias inside the debugger. The debugger also provides User defined Pseudo-Registers for scripting inside the debugger. We can use them to manipulate values within our scripts. There are 20 of them from $t0,$t1.. $t19. These are pre-defined names that we cannot change.
The syntax to assign value to the register is “r”

r @$t0 = 0

FYI the default value is 0.

bp 000007ff0003c050  ”  r @$t0 =@$t0 + 1;gc “

The above command will increment the $t0 register by one more, every time the break-point is hit. And gc is to go to next conditional break-point. To get the output of the register we can use the expression evaluator command “?”

0:004> ? @$t0
Evaluate expression: 5 = 00000000`00000005

There quite a few places we can use these inside the debugger. Example we could calculate the total size of multiple instances of an object type inside the debugger. That is we could get !objsize of each instance for type DataTable and then total it, to get the total memory consumption of datatable within our process.

Conditional Breakpoint in .NET using Windbg


With my few years of production debugging .NET code ,one thing that has really helped me a lot is Windbg. Lot us of know that using sos, sosex and Windbg we should be able to troubleshoot most of the .NET Code. But certain tips / tricks makes us productive in those crucial moments. I am assuming that you are aware of basic usage of sos and windbg.

We know by using !bpmd command we can stick in a break-point on a method. But the issue is we would want to break in to the debugger only on a certain condition, very similar to VS.NET break-point condition.

Here is the sample code that i am going to be using to set the conditional break-point. This is a simple Winforms app.

  1: using System;
  2: using System.Windows.Forms;
  3:
  4: namespace WindowsFormsApplication2
  5: {
  6:     public partial class Form1 : Form
  7:     {
  8:         public int Foo;
  9:
 10:         public Form1()
 11:         {
 12:             InitializeComponent();
 13:         }
 14:
 15:         private void Button1Click(object sender, EventArgs e)
 16:         {
 17:             Test(textBox1.Text);
 18:         }
 19:
 20:         void Test(string s)
 21:         {
 22:             Console.WriteLine(s);
 23:         }
 24:
 25:     }
 26: }
 27:

In this sample code I would like to put a conditional break-point on the Test method. After attaching the application to windbg ,look for the object Form1 in the heap.

0:000> !dumpheap -type WindowsFormsApplication2.Form1
Address MT Size
0000000002c92548 000007ff004b7b78 480
total 1 objects
Statistics:
MT Count TotalSize Class Name
000007ff004b7b78 1 480 WindowsFormsApplication2.Form1
Total 1 objects

The next step was to get the address of the function Test.

0:000> .shell -ci "!dumpmt -md 000007ff004b7b78" FIND "Test"
000007ff004a3508 000007ff004b7af0 JIT WindowsFormsApplication2.Form1.Test(System.String)
.shell: Process exited

The reason to get address of the function is to use the native “bp” command to set the break-point. FYI the sos also uses only the built in bp command for setting the break-point. The difference is condition that we can pass to the break-point. In the above I use the .shell command to look for Test function address instead of manually looking for the Test function. The .shell command comes in very handy.

In this exercise I would like to break into the debugger only if the argument “s” matches certain condition. I set the bp on the method test using the command “bp 000007ff004a3508”. And here is the result when the break-point hits.

0:000> g
Breakpoint 0 hit
rax=0000000002d9fcd8 rbx=0000000000000000 rcx=0000000002c92548
rdx=0000000002d9fcd8 rsi=0000000000000001 rdi=0000000000000000
rip=000007ff004a3508 rsp=000000000028d3e8 rbp=000000000028d590
r8=000000000028cde0 r9=000007feed0b14c0 r10=000007feff469f20
r11=000007ff00060120 r12=00000000003a9460 r13=0000000000000202
r14=000000001b3e23b8 r15=0000000000030672
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
000007ff`004a3508 e9f35a2c00 jmp 000007ff`00769000

FYI I am using a 64-bit machine and that’s the reason my pointers are much bigger than usual x86. We are interested on the argument “s”  that is passed to the method which is @rdx register “rdx=0000000002d7ed00”. To verify that we setting the break-point on the correct argument we can test it by using command

0:000> .printf "%mu",@rdx+10
testa

Not many of them are aware of how to get just the string from string object , instead of the all additions from !dumpobj. The above command would get just the string . The command .printf contains “%mu” because it is null terminated unicode string and @rdx is the register which contains the argument “s”. The “@rdx+10” is the actual location of the string in memory and for the x86 it would be “@rdx+c” for the actual string. Now that we are sure the @rdx is the register we can build condition for the argument. Here it is

bp 000007ff004a3508 ".block {as /mu ${/v:cmp} @rdx+10; .if ( $spat( \"${cmp}\", \"*test*\" ) ) { !clrstack; } .else { gc }}"

And here is the detailed explanation of the above condition within quotes. The .block command is used for alias evaluation. Alias is like variables within windbg. The “as /mu ${v:cmp} @rdx+10” command creates an string alias by name  of cmp which contains the value of argument “s”. This condition would be evaluated only when the functions first line of code is executed so @rdx will always have the value that is passed to the function. The next

“.if ( $spat( \"${cmp}\", \"*test*\" )  ) { !clrstack; } .else { gc }}”

command is real crux where the code compares the alias cmp with “*test*”. Notice i am using a built-in function called $spat which is nothing but a string pattern function. So from the above condition i am instructing the break-point to give a callstack if the argument “s” has something like “*test*” . If not the command “gc” means go to the next conditional break-point,similar  F5 in VS.NET. So, for example if the function is called 40 times and of which we are interested only once when it is  something like “*test*” ,with the existing  !bpmd we would wasted our time 39 times.

The “\” is a escape character in windbg ,i am using it because i would have to a string compare.

Identify and Patch .NET Code using Windbg


 

The last week was really an interesting one with debugging production code. I was debugging a Winforms application which was using .NET framework 3.5 version. The real problem was with the latest release of the code, there was bug which caused certain elements on the UI not to be displayed. This is was High priority bug and very important to the business.

The code that was causing this bug was an integer variable inside a class.

using System;
using System.Windows.Forms;

namespace WindowsFormsApplication2
{
    public partial class Form1 : Form
    {
        public int Foo;
        public Form1()
        {
            InitializeComponent();
        }
        private void Button1Click(object sender, EventArgs e)
        {
            if (Foo == 100) 
                this.label1.Visible = true;
        }
    }
}

The code was something like this. If Foo was equal 100 then the label was set to be visible. Reflector came in handy to disassemble  the code.With the latest code release the  “if (Foo == 100) “ condition was introduced.

The next step was to verify and validate that this condition was the reason for this bug. Though this looks very simple because i have shown a very contrived example which does not involve all the dependencies of the real world business application.

Fired up windbg looked up the heap for object type Form1 using dumpheap.

0:004> !dumpheap -type Form1

         Address               MT     Size

0000000002cb2548 000007ff004b7b68      480    
total 1 objects

Statistics:

              MT    Count    TotalSize Class Name

000007ff004b7b68        1          480 WindowsFormsApplication2.Form1

Total 1 objects

The next was figure of the offset of Foo. So used the !do command on Form1 instance. !do 0000000002cb2548 and here is the partial output of the command   

000007ff00107050  4001e9b     1780        System.Object  0   static 0000000002cb2d48 EVENT_MAXIMIZEDBOUNDSCHANGED

0000000000000000  4000002      1b8                       0 instance 0000000000000000 components

000007ff00694f40  4000003      1c0 …dows.Forms.Button  0 instance 0000000002cdc040 button1

000007ff006962c8  4000004      1c8 …ndows.Forms.Label  0 instance 0000000002cdc2f8 label1

000007ff002683d8  4000005      1d0         System.Int32  1 instance                0 Foo

So from the result i could identify that the Foo variable was on the 1d0 offset of the Form1 object.

After couple of test case runs i dumped the object and here was the output

000007ff00694f40  4000003      1c0 …dows.Forms.Button  0 instance 0000000002cdc040 button1

000007ff006962c8  4000004      1c8 …ndows.Forms.Label  0 instance 0000000002cdc2f8 label1

000007ff002683d8  4000005      1d0         System.Int32  1 instance               23 Foo

So the Foo’s value was now 23. I am sure most of us are used to updating the variable’s value in VS.NET using immediate window. I did something similar to that but instead used windbg.

In Windbg numbers are hex values, so to set the value as 100 it would have to be 64. You can fire up calc to figure this out or you use the command in windbg ?64

Evaluate expression: 100 = 00000000`00000064

FYI “?” expression evaluator in windbg. Now the final step of updating the Foo in memory. The command to do that is

ed 0000000002cb2548+1d0 64

“e” command is enter values in memory. “e” has many flavors like eu,ed,ea. And ed command is for updating  Double-word values. So ed is to update double-word value and the memory location is 0000000002cb2548+1d0 which is the Form1 memory location  along with Foo is offset. 

Voila here is the output of !dumpobj after updating the memory

000007ff002683d8  4000005      1d0         System.Int32  1 instance              100 Foo

Now we could make an emergency patch and be certain the patch would work with the fix already tested in production using the debugger. Having windbg in your toolbox is always saves a lot of time.  

Home cooked web monitoring use Rx


One of the key things in software is to write succinct , declarative and asynchronous code. The answer to that is Reactive Extensions for .NET 

I have been digging into Rx for sometime now. Though there isn’t much of documentation I have been kind of successful in getting certain things done with it.

One of requirement that came up from our Operations was to monitor some websites and notify someone if the site was not accessible. But the added constraints to this were check for the site status only at a certain interval and report failure only if it was more than certain percentage within a certain duration.

So it was like check the site status every 5 seconds , buffer the results for a minute and in the buffered response if it had more 3  failures then tweet someone about it.

And here is the code to solve the problem

 (from time in Observable.Interval(TimeSpan.FromSeconds(5))
 let req = WebRequest.Create("http://www.nonexisting.com")
 from res in Observable.
    FromAsyncPattern<WebResponse>(
        req.BeginGetResponse, req.EndGetResponse)()
 .Materialize()select res).Buffer(new TimeSpan(0, 0, 1, 0)).
    Select(failed => 
        failed.Where(
        n => n.Kind == NotificationKind.OnError)).
    Where(failed => failed.Count() > 3).
    Subscribe(x => Tweet("Nonexisting.com Failed thrice"));
Console.Read();

The “from time in Observable.Interval(TimeSpan.FromSeconds(5)” is for ensuring that an Observable is generated every 5 seconds to check the status of the website. In the next line the code creates a web request.

Using the FromAsyncPattern I was able to reduce all the plumbing code to handle async i/o calls for the web request.  The materialize is for the sequence to continue even if there is an exception. Here is an good write up on Materialize. And the rest is just the usual Linq where the code filters Notification type of error.

This was a fun exercise. I will continue to explore Rx and blog about it.

Resharper Template for F#


I have been hacking F# lately and I am big fan of ReSharper. So why not create a few Live Templates for the things i do daily in F#.

Download R# live template for F#. I am sure there are few more things that can be part of this template. I have posted the code in MSDN Code gallery so that others can find and contribute.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: