VS 2010 Beta 2 Concurrency Visualizer Profiling In Depth First Look
One of the most important technologies used at Microsoft is Event Tracing for Windows (ETW), especially inside Windows 7 and Server 2008 R2. There's a small problem with ETW. It's beyond wonderful for the Windows team at Microsoft, but for just about everyone else, it's not so hot. Don't get me wrong, ETW is an amazing technology but the problem is that the tool support is geared at the operating system developers, not you and me.
Every time I've talked to a developer on Windows team about a performance problem they always beg for an ETW log gathered through a tool called Windows Performance Analysis Tool, which most of us know as Xperf. It's a wonderful tool for overall operating system performance but it makes me chuckle. When you're debugging with Visual Studio, your application is the white box and the operating system is the black box. With Xperf, the operating system is the white box and your application is the darkest and deepest of black holes. This is especially true if using Xperf to look at a managed application. Relating anything back to your application code is excruciatingly difficult even on a very good day and impossible every other time.
What we need is a tool that lets us gather all this amazing information from ETW and relate it directly to our code. That tool is Concurrency Visualizer with Visual Studio 2010 Beta 2. This tool has undergone a wonderful set of changes from Beta 1 and it's all for the better for anyone doing .NET or native development.
Like Concurrency Resource profiling, turning on Concurrency Visualizer profiling is very easy. You can do it through the command line with VSPerfCmd.exe or through the Performance Explorer, General property page, by selecting Concurrency and checking "Visualize the behavior of a multithreaded application."
Collecting Concurrency Visualizer data requires the collector, Visual Studio or the command line tools, to be running with administrator rights. Additionally, when collecting on an x64 platform, following message box appears when you start collecting data.
The link goes to this page, shows how to set the DisablePageExecutive registry key. By setting this key, you're keeping all drivers and system code in memory, which is necessary to give you complete stack walks into kernel mode when the Concurrency Visualizer profiler collects data. The choice is yours because the tool will collect useful data even with it off. I've been running the Concurrency Visualizer tool without setting the registry key as I don't care about the kernel side of call stacks, and have not had any problems. I need to do more exploring and comparisons with this switch on and off to see how important it is.
To show the power of the Concurrency Visualizer profiler, I'm going to run my Paraffin program to build a WiX file for the Visual Studio 2010 Beta 2 DVD. Paraffin does a lot of I/O so it's a good example of a representative program you'll be running under the Concurrency Visualizer profiler.
After you collect your data, there are three views for your program. The first is the overall CPU utilization.
While you get to see your own process CPU utilization in this view, the important data is that you see other processes as well as system processes. What's fascinating to me about this run is that I see a lot of idle time, but Paraffin's CPU utilization comes in bits and spurts. All Paraffin does is dig through a directory, gets all the files in the directory, and build an XML file of that data. Seeing that it runs and pauses a lot makes me wonder what's really going on.
The third view, Cores, shows how your application executed on the different CPUs on the system. As it says at the top of the view, if your process threads are moving between cores, you may have a performance problem. Look at the view below and ask yourself if this application has a performance problem.
The second view, Threads, is where you're going to spend all your time. The upper part of the view shows the "swim channels" for each disk and thread in the application. The lower left hand corner contains the legend for what each color means, which is obviously extremely important. The lower right portion contains various informational panes, which are described in the tabs.
Before we jump into poking deep on this run, today Concurrency Visualizer shows all the threads in the application, even the "Debugger Helper" which is one from the profiling API used to help gather the data. For Beta 2, you can right click on a thread and hide it from the display. I hope that in the RTM, they automatically hide threads like "Debugger Helper" just to avoid any confusion.
There are two ways to determine what a thread does. The first way is to hover your mouse over the thread name to see the thread startup function/method. The second is since Concurrency Visualizer will show you the call stack of the selected location, I clicked on sections of the "Worker Thread" to get an idea what they were. In the screen shot below, clicking around the sections shows the calls stack in the Current stack tab.
Based on the call stack with WKS::GCHeap::FinalizerThreadWorker from CLR.DLL, you can probably be certain that you're looking at a finalizer thread. What I'd love to see in the future is if the Concurrency Visualizer would either analyze and identify the threads automatically or allow me to rename the thread so you wouldn't have to remember which thread ID is what thread.
For the run I'm showing above, I was running Visual Studio 2010 in a booted VHD. As the Concurrency Visualizer shows all disk access, I want to filter out any activity on "Disk 0," the physical hard disk in the system, so I can concentrate on just the disk access Paraffin truly made in the VHD. Like the threads, to remove a disk "swim channel" right click in the Name area and select Hide.
Additionally, threads 1260, 4424, and 3248 are all pink and from the Visible Timeline Profile, are doing nothing but synchronizing. That means they are waiting on handles or other synchronization events in kernel mode and not doing anything. I'm going to hide those as well so I can focus on my Main Thread and the disk I/O for Paraffin. Below is resulting view.
By the way, a great trick to focus on just those threads with execution time is to set the Sort by dropdown to Execution. Then SHIFT+CLICK to select all the threads sitting in Synchronization time, right click and select Hide. Now you've got just the important threads to worry about.
Before you read the text after the screen shot, look carefully at what the Concurrency Visualizer is reporting. You can see everything you need to see in this screen shot.
You're not cheating and reading down here are you? If you are, move your eyes back to the above screen shot. There's some amazing data in there!
As I described, Paraffin is a tool that you use to recursively scan a directory tree for files and builds up an XML tree of those files, which, in the big scheme of things, is a very simple tool. However, how much time did Paraffin actually execute its own code during that run? A whopping 17%. What did it spend the bulk of its time doing? Paging. 77%, to be exact. Over three fourths of the executing time was spent waiting on the disk.
When I first looked at the output in Concurrency Visualizer, I thought there was no way that was true. It's just reading directories and files for goodness sake! I rebooted and ran again and repeated. (To duplicate the cold boot run) Guess what? It really is around 75% of the time is spent paging, not executing. Concurrency Visualizer gave me a huge WOW moment. I would have never expected it to be that high, but the tool does not lie. I found it fascinating that my first run of Concurrency Visualizer showed me an insight into my program that I never had seen before. Especially considering that I hadn't even started doing any digging with the rest of the tool!
I can say with certainty that 99.9% of the performance problems I've worked on have been I/O bound problems. Up until Concurrency Visualizer, I've always had to look at the problem with multiple tools with the profiler for the code and either Perfmon or Xperf for the I/O side. The trouble before was matching up the two views of what was going on. That's no longer a problem with Concurrency Visualizer.
In Beta 2, Concurrency Visualizer only shows the disk I/O on the swim channel view. I was curious how one would see network I/O so I profiled CMD.EXE and executed a recursive directory search on a network share. Below is the Threads view for the main thread.
All the purple indicates the raw I/O, which was 55% of the zoomed in selection. The yellow indicates preemption, which means the scheduler pulled the thread off the CPU because other code had to run. In this case, it was kernel mode code to do the network I/O. I'm extremely excited by the fact I can get a true picture of I/O directly mapped to my application's execution. This is going to help everyone out immensely.
While it's great to see the I/O mixed with execution, you can drill down in Concurrency Visualizer to see what your application is doing during the run. One of the key things ETW offers is blazingly fast information recording with stack traces when collecting data. I'll turn back to the Paraffin run from earlier and look at some of the action. At about one second into the run there is a bunch of green (the color of execution) that indicates a period of execution took place and I want to see what happened there. I zoomed in by selecting the area, clicked on a block, and the view looked like the following.
In the call stack window, you see the black items are from my code. The grayed items are from areas where you don't have source code. One thing I would love to see in a future version of Concurrency Visualizer is a toggleable "Just My Code" feature like the Sampling and Instrumentation Profiler has.
As you're looking at the Concurrency Visualizer, it's easy to be overwhelmed by data. The team has implemented a feature that always makes me smile: Demystify. If you are unsure what you're looking at or what it means, click the Demystify in the upper right corner of the window, or in the Hints tab, followed by a clicking on the item you need enlightened. That will jump you to the help where you'll get a detailed explanation for the item. It's hugely helpful and I feel whoever thought up the name to put on the button deserves a raise. Unfortunately, for Beta 2, the Demystify button isn't working, but the team put the equivalent of the Demystify up at their blog so you can learn about the tool. All the Concurrency Visualizer documentation is up on MSDN as well.
The team has also written an excellent piece of documentation you need to read that will give a ton of help on how to look for problems using Concurrency Visualizer. Click on the Hints tab and click the link to browse a gallery of common concurrency patterns. (Again, the help is broken in Beta 2, so see this in the documentation). The team has written all sorts of bad applications so you can see what they look like in the Concurrency Visualizer tool. Half of diagnosing a problem is knowing what to look for and the rogues gallery of bad visualizations will greatly speed up your analysis.
The last piece of analysis I want to discuss with the Paraffin program run is the Profiler Report tab. As you start zooming in on the threads in the swim channel view, the legend in the bottom left corner changes to be the "Visible Timeline Profile." That shows you the breakdown of the zoomed section. In the following screen shot, I wanted to dig down deeper into what was contributing to the Paging in that section of the run.
It may be a little hard to see, but I selected Paging and made the Profile Report tab active. In this mode you'll see all the call stacks samples for the visible section. I expanded the stacks out until I was at the terminal stack calls, which are all to Directory.GetFiles. This is a great view to see everything going on with your code for the various types of activity.
Where I can see that it was the calls to Directory.GetFiles causing the paging, which makes sense, many times you'll be sitting there looking at a run wishing you could see what actual files or directories were being accessed. The Concurrency Visualizer team has you covered. By clicking on the File Operations in the Visible Timeline Profile, you get to see some seriously interesting information.
What you get are the file and directory accesses, the number of reads, writes, bytes, and the latency for those reads/writes for the zoomed section. That's some amazing detail! Many of the performance problems I've worked on for clients revolve around startup times. In all those cases trying to figure what files were being accessed at startup was nearly impossible. With Concurrency Visualization, it's now trivial to see exactly what the process accessed, but most importantly, the latency on every access. As I said at the beginning of this article, ETW had a ton of great information in it but getting at that information was never worth the effort. With Concurrency Visualizer, it's trivial!
With a name like Concurrency Visualizer, you're probably wondering where the discussion of concurrency is. Using the same program I wrote to show the Concurrency Resource Profiler, which does sync block and mutex acquisitions on two threads, I can show you the concurrency analysis. In the following screen shot, I zoomed in on one of the spots where one thread was waiting on a mutex and was able to acquire it from the other thread.
The selected area, in bright red, is where thread 3728 is blocked waiting on the mutex. You can see the call stack is showing my code has called a WaitHandle.WaitOne method blocked on the mutex. If you look closely at the thread swim channels, you see a line drawn from the end of thread 3728's synchronization section to where thread 2800 is green, or executing. The Current stack tab tells us that thread 3728 is unblocked by 2800. To prove that here's the call stack for thread 2800.
Looking at the stack, my code called Mutex.ReleaseMutex, which allows another thread to grab it. In this case, it was 3728. The call stacks are captured on the ETW events, which is why thread 2800 has the link line in the middle of its execution block. What you're seeing here is exactly how your application executed. Having spent years trying to visualize how my application works on paper it's very nice to have the real visualization right in front of you.
It's always wonderful to see new insights to your application and Concurrency Visualizer shows you views you've never seen before. I know many of you are going to be surprised by your I/O and threading! The great news is that Concurrency Visualizer, which is part of the Visual Studio 2010 Premium and Ultimate SKUs, is available to everyone today because it could care less what version of the CLR or MFC or anything your application is using. All data analysis takes place through ETW so you can tell Concurrency Visualizer to run against your binaries today. Please give the Concurrency Visualizer team your feedback when you do. Also, see the team's blog and Hazim Shafi's blog for more info. Now get out there and fix your problems with Concurrency Visualizer!