Document: memspeed.txt File Group: Classic Benchmarks Creation Date: 15 May 1997 Revision Date: Title: Memory Data Transfer Rate Tests Keywords: BENCHMARK PERFORMANCE MEMORY CACHE Abstract: This document contains details of pre-compiled C programs that measure memory and cache data transfer rates. Windows 95/NT, OS/2 and DOS versions are supplied. The main purpose is to identify performance peculiarities when running the Classic Benchmarks. Some Operating System overheads are also identified. Note: The programs are still under test and should be treated as beta test software. Please submit feedback directly directly to Roy, or as a posting in Section 12. The programs have been tested on various PCs including 486, Pentium and Pentium Pro systems. They have also been run via Windows 95 and NT 4.0. The DOS versions have also been run directly in the associated DOS mode and when booted as DOS only with MS DOS 6.2. The programs use greater than 6 megabytes of memory. Contributor: Roy_Longbottom@compuserve.com MEMORY DATA TRANSFER RATE TESTS 0. SUMMARY The program employs three different sequences of operations, on 64 bit double precision floating point numbers, 32 bit single precision numbers and 32 bit integers via data arrays. The memory loading speed is calculated in terms of millions of bytes per second. Ten sets of measurements are made using between 4000 and 2048000 memory bytes to produce speed ratings via data from different levels of cache and from RAM. Running time is normally about ten minutes to produce 90 results. Pre-compiled versions MDTRNT95.EXE, MDTROS2.EXE and MDTRDOS.EXE are provided for Windows 95 or NT, OS/2 and MS DOS. Windows 3.1 users can run the DOS version. Results are displayed as the program is running and saved in file XFERMBPS.TXT which should appear in the same directory as the EXE files. The DOS version requires DOS4GW.EXE the protected run-time program. Before running, all other applications should be closed. To run, click on the appropriate EXE icon or enter the program name at the command prompt. The most consistent speed measurements are likely to be produced when the PC is booted directly with DOS. Results should be sent to Roy_Longbottom@compuserve.com and details of the system under test should be included. The configuration details can be provided via program SYSTEMxx.EXE as supplied with the Classic Benchmarks. 1. INTRODUCTION On examining results from the Classic Benchmarks, performance comparisons between various PCs are sometimes different to those obtained by other standard benchmarks that measure CPU speeds. The memory data transfer rate test has been produced to help to identify reasons for the differences. The program measures performance in terms of millions of bytes per second (not megabytes per second where mega is usually defined as 1024 * 1024) of increasing memory block sizes from defined arrays. The concept is based on the wonderful CACHECHK program. However, using a high level language, there is no control on absolute memory addressing. Also, with an optimising compiler, it is not normally possible to make measurements where only memory reading is carried out - if results are not used outside the timing loop the compiler will probably not include the code. For example, on finding a loop with a = x[loop_variable], the compiler is likely to ignore all the assignments except the last one. The measurements made are for 64 bit double precision floating point numbers, 32 bit single precision and 32 bit integers with the following assignments and calculations in inner timing loops: sum = sum + x[m] * y[m] (+y[m] integers as faster) x[m] = x[m] + y[m] x[m] = y[m] Pre-compiled versions, via Watcom C/C++ 10.5, MDTRNT95.EXE, MDTROS2.EXE and MDTRDOS.EXE are available for Windows 95 or NT, OS/2 and MS DOS. Windows 3.1 users can run the DOS version. The DOS version requires DOS/4GW the protected run-time program - Copyright (c) Rational Systems Inc. 2. PROGRAM DETAILS The program declares the following arrays. With double precision numbers being eight bytes and the others four, memory used is approximately 6 * 256000 * 4 or 6,144,000 bytes. With the Watcom compiler, the arrays are inserted in the stack at the end of the data segment and aligned on 16 byte boundaries. main() { double xd[128008]; double yd[128008]; float xs[256008]; float ys[256008]; long xi[256008]; long yi[256008]; Nine separate measurements of millions of bytes per second are made using ten parameters (loop variable m maximum) of 250, 500, 1000, 2000, 4000, 8000, 16000, 32000, 64000 and 128000. Each measurement uses two arrays, so double precision tests occupy 4000, 8000, 16000, 32000, 64000, 128000, 256000, 512000, 1024000 and 2048000 bytes. The maximum value of m is doubled for single precision and integer measurements to use the same memory capacity. The sizes chosen are such that slightly less than a given sized cache will be occupied. Each measurement is self calibrating, via an outer loop, to run for approximately 5 seconds on Pentium CPUs or faster. This is to produce a timing resolution within 1% using the standard PC timer. Some measurements can take greater than 10 seconds on 80486DX2/66 based PCs. With 90 measurements, calibration and other overheads, total running time is about 10 to 11 minutes on any 80486DX2/66 system or faster. Each timing loop is partially unrolled with a loop increment of four and four sequential statements. This is to reduce looping overheads and maximise memory transfers. The data transfer rates calculated are based on the data read. The last six measurements also write to memory. 3.1 ARRAY AND ARITHMETIC TO VARIABLE The double and single precision tests carry out the following calculations (using the appropriate arrays). The multiply and add produces a significantly higher data throughput than two adds. The integer test is of the form sumi = sumi + xi[m+] + yi[m+] as multiplication produces lower throughput with integers. for (m=0; m