[Home]
[Search]
[Contents]
Trace Dynamic Profiling
Page swapping can cripple an otherwise fast program. Normally,
code is written in such a manner as to group together functions
that are conceptually related, not those that call each other.
To minimize page swapping, functions must be grouped together that
call each other a lot at runtime.
Grouping functions that call each other has other benefits, even on
machines that have sufficient memory to run the application without
swapping:
- Called functions will be more likely to be in the memory
cache already.
- When Windows loads a program or
DLL, it doesn't actually read it into memory. It simply
maps an address range onto the file. Then, when the
program executes, pages of the program are swapped
into memory as they are referenced. Grouping functions
that call each other will minimize the amount of the
program that actually has to be read in from disk, thus
minimizing load time.
A compiler can determine the static calling relationships between
functions in a program, but this has severe limitations. The compiler
has no way to determine at compile time which connections are called
relatively often. There is also no way to determine function calling
relationships via virtual function dispatch, or other dispatch via
function pointers.
What is necessary is to gather this information at runtime,
preferably while running a typical application of the program.
Such dynamic profiling information can then be processed to yield
an order in which functions should be placed by the linker.
Dynamic Profiling With DMC++
This capability is now built in to Digital Mars C++. It is currently
only implemented for Win32 systems, but the results should be
directly applicable to a 16 bit version of your program.
Setting Up
It's enabled by
two switches working together: -gt and -Nc.
-gt by itself inserts profiling code into every function, and when
the program terminates, it prints out a log showing for each
function:
- What other functions call it and how often (called the "fan in")
- What functions it calls and how often (the "fan out")
Throwing -gt will also predefine the macro
#define _DEBUG_TRACE 1
If -Nc (make all functions COMDATs) is thrown as well, profiling
code is only inserted into global COMDAT functions, since it is
only global COMDAT functions that the linker can adjust the link
order on.
If there are many static functions, and those are to be profiled
and grouped as well, throw the -gg (make static functions global)
also. This assumes, of course, that all static function names in
the -gg modules are distinct from one another and all global names.
The program is linked normally.
Gathering the Data
Run the program in the manner it is typically used. The trace code
will accumulate the information, and when the program is terminated,
the information will be written to files 'trace.log' and 'trace.def'.
trace.log is a readable text file with a list of all the traced
functions, along with who calls them (fan in) and who they call (fan
out), and each time this occurs.
Following this is listed timing data. The data are:
Num Calls: Number of times the function was called.
Tree Time: Time used by function plus the tree times of all the
functions it calls.
Func Time: Time used by a function excluding the time
used by the functions it calls (times for functions
it calls that are not compiled with -gt, like
RTL functions, are included in Func Time).
Per Call: (Func Time) / (Num Calls)
If multiple runs are done with the program, the profiling data is
merged and summed with any previously existing trace.log file.
The final output is the 'trace.def' file, which is meant to be
included in the linker module definition file. It contains the list
of the functions in the order the linker should link them.
Modifying Dynamic Profiling Behavior
The behavior of the profiler can be modified with several
functions that can be explicitly called in the program. They are
prototyped in trace.h. They are implemented aa macros that expand to
nothing if -gt is not thrown.
void trace_term()
Normally, this function is called when the program exits as part of
the static destructor list. If this list is not called, or you wish
to stop gathering data before the end of the program (for instance,
to optimize for fast load), then insert a call to trace_term() at
the point where it is to stop.
int trace_setdeffilename(const char *name)
Set file name to use for .def file instead of trace.def.
Returns: 0 for success, non-zero for failure
int trace_setlogfilename(const char *name)
Set file name to use for .def file instead of trace.log.
Returns: 0 for success, non-zero for failure
Optimizing Program Load Time
To optimize the program for fast loading, rather than fast running,
gather the profile data only as far as the program is loaded. Then,
call trace_term() to terminate the data gathering and write the
output files.
Notes
- The library does not have profiling instrumentation in it, so the
library functions are not grouped by the linker, nor is any profiling
information collected for library functions. The extra code linked in
to do the profiling makes use of the library functions, so if the
library functions are instrumented, some infinite loops can occur.
So don't recompile the library with -gt.
- Functions declared with __declspec(naked) never get profiling
instrumentation added to them.
- The profile data gathering code will get confused if non-local jumps
occur when things like setjmp/longjmp, Windows NT structured
exceptions, or C++ exceptions are used. If this profiling tool is
well used by customers, we'll address these shortcomings.
- The timing is based on the Win32 performance timing API, which is
accurate to about one microsecond. Unfortunately, any paging or any
other tasks running can erratically change the timings. There is no
real way around this besides just running the tests several times
and averaging the results.
- The overhead of the instrumentation code can distort the
measurements when the function times are very short. The profiler
tries to compensate for this, but the overhead times are not always
consistent.
- Profiling of multithreaded apps is not supported.
- Multiple EXEs and DLLs can be profiled simultaneously, but the log
files and def files must all be set to different names, or else
they will step on each other.