Get Managed and Native Call Stacks from Event Tracing for Windows (ETW)
Here is the TOC of series of posts on ETW
With the latest release of PerfMonitor from the CLR team now we can have Managed call stacks along with native code. How cool is that? To get managed / native call stacks we need minimum of Vista, this is not available in XP.
Here is the sample source code I am going to be using to demonstrate the managed and the native call-stacks
using System;
using System.Linq;
namespace ConsoleApplication9 {
class Program {
private static void Main() {
Console.Read();
AllocateLOH();
}
private static void AllocateLOH() {
for (int i = 0; i < 10; i++) {
var x = new object[85000];
x.Count();
}
}
}
}
This is a very simple app that allocates large chunks of memory, which will get allocated in the LOH and frequent allocations in LOH causes high CPU. To identify this we would have to attach it to windbg and put a break-point on “WKS::gc_heap::allocate_large_object” and here I have blogged about the same
But with ETW we will not need Windbg, the ETW can do stack walks for the function calls and this makes it really great. To enable this feature first thing is to download the source and compile the code from codeplex . FYI if you are using x64 th tool would not work. It would work on wow64 , so compile it for x86.
After which run the command
PerfMonitor crawlableCLRStacks
This will take a while to complete, it is essentially setting the ngened images to debug attributes so we can get better stack walks. The next command is to start etw trace
PerfMonitor.exe /lineNumbers /stacks /keyAny:0x1 start
So we are essentially instructing the tool to get GC Information (/keyAny:0×1) along with stackwalk . The tool provides help which really useful.
After which I start the application and let the complete and then I stop the trace.
Perfmonitor.exe stop
And then dump the trace to a xml file
perfmonitor.exe print
This will dump out a huge file but what we are interested in Large Object Allocations on GC call stacks
<Event MSec= "6940.8972" PID="365044" PName="ConsoleApplication9" TID="364352" EventName="GCAllocationTick" AllocationAmount="0x00053040" AllocationKind="Large" ClrInstanceID="15"> <StackTrace> <CodeAddress Address="0x66669909" FullMethodName="ETW::SamplingLog::SendStackTrace" ModuleName="clr"/> <CodeAddress Address="0x66669a8d" FullMethodName="EtwCallout" ModuleName="clr"/> <CodeAddress Address="0x6669d870" FullMethodName="CoMofTemplate_qqh" ModuleName="clr"/> <CodeAddress Address="0x666d5d44" ModuleName="clr"/> <CodeAddress Address="0x6653ff5b" FullMethodName="WKS::gc_heap::allocate_more_space" ModuleName="clr"/> <CodeAddress Address="0x66475d43" FullMethodName="WKS::gc_heap::allocate_large_object" ModuleName="clr"/> <CodeAddress Address="0x666d61d8" FullMethodName="WKS::GCHeap::Alloc" ModuleName="clr"/> <CodeAddress Address="0x666a39a7" ModuleName="clr"/> <CodeAddress Address="0x664ef3d8" FullMethodName="AllocateArrayEx" ModuleName="clr"/> <CodeAddress Address="0x664ef464" ModuleName="clr"/> <CodeAddress Address="0x1f00ee" FullMethodName="ConsoleApplication9.Program.AllocateLOH()"/> <CodeAddress Address="0x1f008e"/> <CodeAddress Address="0x663f213b" FullMethodName="CallDescrWorker" ModuleName="clr"/> <CodeAddress Address="0x66412c2f" FullMethodName="CallDescrWorkerWithHandler" ModuleName="clr"/> <CodeAddress Address="0x66412dcc" FullMethodName="MethodDesc::CallDescr" ModuleName="clr"/> <CodeAddress Address="0x66412e01" FullMethodName="MethodDesc::CallTargetWorker" ModuleName="clr"/> <CodeAddress Address="0x66412e21" FullMethodName="MethodDescCallSite::Call_RetArgSlot" ModuleName="clr"/> <CodeAddress Address="0x664c8872" FullMethodName="ClassLoader::RunMain" ModuleName="clr"/> <CodeAddress Address="0x664c8980" FullMethodName="Assembly::ExecuteMainMethod" ModuleName="clr"/> <CodeAddress Address="0x664c8794" FullMethodName="SystemDomain::ExecuteMainMethod" ModuleName="clr"/> <CodeAddress Address="0x664ed6f5" FullMethodName="ExecuteEXE" ModuleName="clr"/> <CodeAddress Address="0x664ed5f7" FullMethodName="_CorExeMainInternal" ModuleName="clr"/> <CodeAddress Address="0x664e6db8" FullMethodName="_CorExeMain" ModuleName="clr"/> <CodeAddress Address="0x704561f0" ModuleName="mscoreei"/> <CodeAddress Address="0x706f7f16" FullMethodName="ShellShim__CorExeMain" ModuleName="mscoree"/> <CodeAddress Address="0x706f4de3" FullMethodName="_CorExeMain_Exported" ModuleName="mscoree"/> <CodeAddress Address="0x77bc9d72" FullMethodName="__RtlUserThreadStart" ModuleName="ntdll"/> <CodeAddress Address="0x77bc9d45" FullMethodName="_RtlUserThreadStart" ModuleName="ntdll"/> </StackTrace> </Event
Here is a simple linq statement to extract the above information
void Main()
{
var events = from e in XDocument.Load(@"c:\temp\test.xml").Descendants("Events").Descendants("Event").Take(5)
where e.Attribute("EventName").Value == "GCAllocationTick" && e.Attribute("AllocationKind").Value == "Large"
select new {Amount = Dec(e.Attribute("AllocationAmount").Value) ,
Stack = e.Descendants("StackTrace").Descendants().Attributes("FullMethodName")};
var f = from e in events
let stack = e.Stack.Select (s => s.Value )
select new {AllocatedAmount = e.Amount,callstack = stack};
}
public static int Dec(string hex)
{
var x1 = hex.Remove(0,3);
return int.Parse(x1,System.Globalization.NumberStyles.AllowHexSpecifier);
}
And here is the outcome from the linq code
IEnumerable<> (2 items)
| AllocatedAmount | callstack | ||||||||||||||||||||||||
| 340032 |
|
||||||||||||||||||||||||
| 340032 |
|
||||||||||||||||||||||||
| 680064 |
Wow now we can see every callstack from ntdll to CLR along with our code in callstack. How cool it is to see managed callstack and native in one single block. This wasn’t possible until now. Thanks to the CLR team. Just like I mentioned in my previous blog post we wouldn’t have to use Windbg to get this information. This is only tip of the iceberg , we can do lot of cool things with this. I am sure Microsoft is working hard to integrate this within XPERF , but I am going to take a stab and see how I can display this data in a much more graphical way in my future posts. So next time someone complains of Performance issue in managed code you can probably ask them to just get on trace file , which will include everything to figure compared to getting few memory dumps , perfmon counters , webserver log .
Thanks for this article. Looking for forward to experimenting with this at work.
Marc
Marc Sherman
April 3, 2010 at 10:25 am
Thanks. Let me know if you run into any issues.
Naveen
April 3, 2010 at 10:32 am
When running the perfMonitor crawlableCLRStacks
I get an error message of: Unrecognized operation crawlableCLRStacks.
How can I get the managed callstacks with the new version on PerfMonitor?
Ran Wahle
March 10, 2011 at 2:31 am
The latest perfmonitor does not have this option.
You don’t have to run the crawlableCLRStacks as per say. It is required only if you want better call-stacks from BCL.
FYI the crawlableCLRStacks will update the ngen images with the /debug option. Which is similar to “ngen update /debug”
HTH
Naveen
March 10, 2011 at 9:38 am
The latest PerfMonitor (v1.5) download does not seem to walk stacks anymore. From the summary I do not see the managed call stack for every GC run. Was this removed or do I need different command line options to enable this again?
Alois Kraus
June 5, 2012 at 7:28 am
I have a tendency to go along with every thing that
was in fact put into writing in “Get Managed and Native Call
Stacks from Event Tracing for Windows (ETW) Naveen’s Blog”. Thanks a lot for all the actual tips.Thanks a lot-Nydia
http://tinyurl.com/vilabrien27967
January 29, 2013 at 6:15 pm