Get Managed and Native Call Stacks from Event Tracing for Windows (ETW)

Here is the TOC of series of posts on ETW

With the latest release of PerfMonitor from the CLR team now we can have Managed call stacks along with native code. How cool is that?  To get managed / native call stacks we need minimum of Vista, this is not available in XP.

Here is the sample source code I am going to be using to demonstrate the managed and the native call-stacks

using System;
using System.Linq;
namespace ConsoleApplication9 {
class Program {
private static void Main() {
Console.Read();
AllocateLOH();
}
private static void AllocateLOH() {
for (int i = 0; i < 10; i++) {
var x = new object[85000];
x.Count();
}
}
}
}

This is a very simple app that allocates large chunks of memory, which will get allocated in the LOH and frequent allocations in LOH causes high CPU. To identify this we would have to attach it to windbg and put a break-point on “WKS::gc_heap::allocate_large_object” and here I have blogged about the same

But with ETW we will not need Windbg, the ETW can do stack walks for the function calls and this makes it really great. To enable this feature first thing is to download the source and compile the code from codeplex . FYI if you are using x64 th tool would not work. It would work on wow64 , so compile it for x86.

After which run the command


PerfMonitor crawlableCLRStacks

This will take a while to complete, it is essentially setting the ngened images to debug attributes so we can get better stack walks. The next command is to start etw trace


PerfMonitor.exe  /lineNumbers /stacks /keyAny:0x1 start

So we are essentially instructing the tool to  get GC Information (/keyAny:0x1) along with stackwalk . The tool provides help which really useful.

After which I start the application and let the complete and then I stop the trace.


Perfmonitor.exe stop

And then dump the trace to a xml file


perfmonitor.exe print

This will dump out a huge file but what we are interested in Large Object Allocations on GC call stacks


<Event MSec=  "6940.8972" PID="365044" PName="ConsoleApplication9" TID="364352" EventName="GCAllocationTick" AllocationAmount="0x00053040" AllocationKind="Large" ClrInstanceID="15">

<StackTrace>

<CodeAddress Address="0x66669909" FullMethodName="ETW::SamplingLog::SendStackTrace" ModuleName="clr"/>

<CodeAddress Address="0x66669a8d" FullMethodName="EtwCallout" ModuleName="clr"/>

<CodeAddress Address="0x6669d870" FullMethodName="CoMofTemplate_qqh" ModuleName="clr"/>

<CodeAddress Address="0x666d5d44" ModuleName="clr"/>

<CodeAddress Address="0x6653ff5b" FullMethodName="WKS::gc_heap::allocate_more_space" ModuleName="clr"/>

<CodeAddress Address="0x66475d43" FullMethodName="WKS::gc_heap::allocate_large_object" ModuleName="clr"/>

<CodeAddress Address="0x666d61d8" FullMethodName="WKS::GCHeap::Alloc" ModuleName="clr"/>

<CodeAddress Address="0x666a39a7" ModuleName="clr"/>

<CodeAddress Address="0x664ef3d8" FullMethodName="AllocateArrayEx" ModuleName="clr"/>

<CodeAddress Address="0x664ef464" ModuleName="clr"/>

<CodeAddress Address="0x1f00ee" FullMethodName="ConsoleApplication9.Program.AllocateLOH()"/>

<CodeAddress Address="0x1f008e"/>

<CodeAddress Address="0x663f213b" FullMethodName="CallDescrWorker" ModuleName="clr"/>

<CodeAddress Address="0x66412c2f" FullMethodName="CallDescrWorkerWithHandler" ModuleName="clr"/>

<CodeAddress Address="0x66412dcc" FullMethodName="MethodDesc::CallDescr" ModuleName="clr"/>

<CodeAddress Address="0x66412e01" FullMethodName="MethodDesc::CallTargetWorker" ModuleName="clr"/>

<CodeAddress Address="0x66412e21" FullMethodName="MethodDescCallSite::Call_RetArgSlot" ModuleName="clr"/>

<CodeAddress Address="0x664c8872" FullMethodName="ClassLoader::RunMain" ModuleName="clr"/>

<CodeAddress Address="0x664c8980" FullMethodName="Assembly::ExecuteMainMethod" ModuleName="clr"/>

<CodeAddress Address="0x664c8794" FullMethodName="SystemDomain::ExecuteMainMethod" ModuleName="clr"/>

<CodeAddress Address="0x664ed6f5" FullMethodName="ExecuteEXE" ModuleName="clr"/>

<CodeAddress Address="0x664ed5f7" FullMethodName="_CorExeMainInternal" ModuleName="clr"/>

<CodeAddress Address="0x664e6db8" FullMethodName="_CorExeMain" ModuleName="clr"/>

<CodeAddress Address="0x704561f0" ModuleName="mscoreei"/>

<CodeAddress Address="0x706f7f16" FullMethodName="ShellShim__CorExeMain" ModuleName="mscoree"/>

<CodeAddress Address="0x706f4de3" FullMethodName="_CorExeMain_Exported" ModuleName="mscoree"/>

<CodeAddress Address="0x77bc9d72" FullMethodName="__RtlUserThreadStart" ModuleName="ntdll"/>

<CodeAddress Address="0x77bc9d45" FullMethodName="_RtlUserThreadStart" ModuleName="ntdll"/>

</StackTrace>

</Event

Here is a simple linq statement to extract the above information

 void Main()
 {
 var events = from e in XDocument.Load(@"c:\temp\test.xml").Descendants("Events").Descendants("Event").Take(5)
 where e.Attribute("EventName").Value == "GCAllocationTick" && e.Attribute("AllocationKind").Value == "Large"
 select new {Amount = Dec(e.Attribute("AllocationAmount").Value) ,
 Stack = e.Descendants("StackTrace").Descendants().Attributes("FullMethodName")};

var f = from e in events
 let stack = e.Stack.Select (s => s.Value )
 select new {AllocatedAmount = e.Amount,callstack = stack};
 }
 public static  int Dec(string hex)
 {
 var x1 = hex.Remove(0,3);
 return int.Parse(x1,System.Globalization.NumberStyles.AllowHexSpecifier);
 }
 

And here is the outcome from the linq code

IEnumerable<> (2 items)

AllocatedAmount callstack
340032
IEnumerable<String> (23 items)
ETW::SamplingLog::SendStackTrace
EtwCallout
CoMofTemplate_qqh
WKS::gc_heap::allocate_more_space
WKS::gc_heap::allocate_large_object
WKS::GCHeap::Alloc
AllocateArrayEx
ConsoleApplication9.Program.AllocateLOH()
CallDescrWorker
CallDescrWorkerWithHandler
MethodDesc::CallDescr
MethodDesc::CallTargetWorker
MethodDescCallSite::Call_RetArgSlot
ClassLoader::RunMain
Assembly::ExecuteMainMethod
SystemDomain::ExecuteMainMethod
ExecuteEXE
_CorExeMainInternal
_CorExeMain
ShellShim__CorExeMain
_CorExeMain_Exported
__RtlUserThreadStart
_RtlUserThreadStart
340032
IEnumerable<String> (23 items)
ETW::SamplingLog::SendStackTrace
EtwCallout
CoMofTemplate_qqh
WKS::gc_heap::allocate_more_space
WKS::gc_heap::allocate_large_object
WKS::GCHeap::Alloc
AllocateArrayEx
ConsoleApplication9.Program.AllocateLOH()
CallDescrWorker
CallDescrWorkerWithHandler
MethodDesc::CallDescr
MethodDesc::CallTargetWorker
MethodDescCallSite::Call_RetArgSlot
ClassLoader::RunMain
Assembly::ExecuteMainMethod
SystemDomain::ExecuteMainMethod
ExecuteEXE
_CorExeMainInternal
_CorExeMain
ShellShim__CorExeMain
_CorExeMain_Exported
__RtlUserThreadStart
_RtlUserThreadStart
680064

Wow now we can see every callstack from ntdll to CLR along with our code in callstack. How cool it is to see managed callstack and native in one single block. This wasn’t possible until now. Thanks to the CLR team. Just like I mentioned in my previous blog post we wouldn’t have to use Windbg to get this information. This is only tip of the iceberg , we can do lot of cool things with this.  I am sure Microsoft is working hard to integrate this within XPERF , but I am going to take a stab and see how I can display this data in a much more graphical way in my future posts. So next time someone complains of Performance issue in managed code you can probably ask them to just get on trace file , which will include everything to figure compared to getting  few memory dumps , perfmon counters , webserver log .

About Naveen
Write code.

6 Responses to Get Managed and Native Call Stacks from Event Tracing for Windows (ETW)

  1. Marc Sherman says:

    Thanks for this article. Looking for forward to experimenting with this at work.

    Marc

    • Naveen says:

      Thanks. Let me know if you run into any issues.

  2. Ran Wahle says:

    When running the perfMonitor crawlableCLRStacks
    I get an error message of: Unrecognized operation crawlableCLRStacks.

    How can I get the managed callstacks with the new version on PerfMonitor?

    • Naveen says:

      The latest perfmonitor does not have this option.

      You don’t have to run the crawlableCLRStacks as per say. It is required only if you want better call-stacks from BCL.

      FYI the crawlableCLRStacks will update the ngen images with the /debug option. Which is similar to “ngen update /debug”

      HTH

  3. Alois Kraus says:

    The latest PerfMonitor (v1.5) download does not seem to walk stacks anymore. From the summary I do not see the managed call stack for every GC run. Was this removed or do I need different command line options to enable this again?

  4. I have a tendency to go along with every thing that
    was in fact put into writing in “Get Managed and Native Call
    Stacks from Event Tracing for Windows (ETW) Naveen’s Blog”. Thanks a lot for all the actual tips.Thanks a lot-Nydia

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: