Document: diskrand.txt File Group: Classic Benchmarks Creation Date: 4 August 1997 Revision Date: Title: Disk and Transaction Random Access Test Keywords: BENCHMARK PERFORMANCE DISK TRANSACTION PROCESSING MULTITASKING RESPONSE TIME Abstract: This document contains details of pre-compiled C programs that measure disk random access times. It also has facilities to emulate transactions accessing a database via multiple users, including parameters for randomised think times, CPU loading and memory occupancy besides average disk I/Os per transaction. Time stamped logs are produced for later analysis. Windows 95/NT, OS/2 and DOS versions are supplied. Note: The program is still under test and should be treated as beta test software. Please submit feedback directly directly to Roy, or as a posting in Section 12. WARNING The programs have been tested on various PCs including 486, Pentium and Pentium Pro systems. They have also been run via Windows 95 and NT 4.0. The DOS versions have also been run directly in the associated DOS mode and when booted as DOS only with MS DOS 6.2. However, it is inevitable that they will not run properly in all circumstances or occasionally may cause certain systems to hang. Likely reasons for failure include hardware or software incompatibility but particularly miss use. Before running the programs, ensure that sufficient free disk space is available. Filling the disk to the last byte can have serious consequences. RUN AT YOUR OWN RISK Contributor: Roy_Longbottom@compuserve.com DISK AND TRANSACTION RANDOM ACCESS TEST 0. SUMMARY RANDDOS.EXE, RANDNT95.EXE and RANDOS2.EXE are compilations of a disk performance test to run via DOS, Windows 95 or NT and OS/2 respectively. RANDDOS requires DOS4GW.EXE the protected run-time program. The program requires run time parameters to Write, Read and Erase in a command line, BAT or CMD file. The first facility provided generates a text file of a specified size then carries out three read passes each of at least 5 seconds or 200 random reads. A typical command to generate a 40 MB file, with three read passes of 10 seconds and a defined log file, is: RANDNT95 Write DataFile.txt 40 30 LogFile.txt In this mode, the log file gives details of average disk service times in milliseconds. The second mode of operation is for emulating transaction processing like activities, including user number, number of transactions, disk accesses per transaction and think time. The user number is required for multi- tasking tests to ensure that different random accesses are selected. Think time controls transaction per second rates by imposing idle time between transactions. A typical command line for user 101 to run 50 transactions with 20 I/Os each and an average think time of 15 seconds is: RANDNT95 Read DataFile.txt 101 50 LogFile.txt 20 15 Think time is randomised as standard. Advanced facilities allow the user number based random number seed to be modified by the time of day to avoid in cache data on repeating a test (Xread) and the number of I/Os to be randomised via a load factor (Yread) or both (Zread). Further features allow additional memory to be occupied by the program or extra CPU loading with parameters for Kbytes and a number effectively representing CPU seconds on a 80486DX2/66. The CPU time used is also randomised by the load factor. For this mode of operation, the log file gives details of user number, time of day that transactions finish, think time used, measured response time and random load factor (when used). The last facility is to delete the file with example command Erase DataFile.txt. The programs can be used for testing multitasking capabilities. When running the test on a well used system, the data file may be fragmented over a large area of the disk and produce worse performance than might be expected for the given file size. If sufficient capacity is available, writing and reading a second file may produce better results. 0.1 FAST START Unzip the programs into a new directory. Result logs will be written in this directory when the programs are run. Note free disk space. DO NOT RUN WITH FILE SIZES GREATER THAN THE FREE SPACE. The program can be run on compressed disks but some care is required on interpretation of results. Windows varieties - close all other applications - display new directory - edit the BAT or CMD file to run or open appropriate window and execute with a command line. Examples below are for a 40MB file. DOS - reboot in direct DOS mode - change directory to that defined - enter the following at the command prompt: RANDDOS Write DataFile.txt 40 30 LogFile.txt Then RANDDOS Erase DataFile.txt Batch test Construct a BAT file (CMD file OS/2) with commands such as described above or, where appropriate, type in as command lines: RANDNT95 Write DataFile.txt 40 30 LogFile.txt RANDNT95 Read DataFile.txt 101 50 LogFile.txt 20 15 RANDNT95 Erase DataFile.txt Click on the BAT file icon or execute the BAT file from a command line prompt. Multitask Run the above batch test without the erase. Construct a BAT file (CMD file OS/2) with commands such as: Start RANDNT95 Read DataFile.txt 101 50 LogFile.txt 20 5 Start RANDNT95 Read DataFile.txt 102 50 LogFile.txt 20 5 Start RANDNT95 Read DataFile.txt 103 50 LogFile.txt 20 5 Start RANDNT95 Read DataFile.txt 104 50 LogFile.txt 20 5 This represents four concurrent users each generating 50 transactions of 20 I/Os at an average of 5 second intervals between. For a first approximation of the running time, ignore the loading on the PC - 50 transactions x 5 = 250 seconds, total 200 transactions at 200/250 = 0.8 transactions per second, total of 4 x 50 x 20 = 4000 I/Os in 250 seconds or 16 I/Os per second. With one I/O per 62.5 milliseconds, this test is unlikely to indicate problems due to overloading. Click on the BAT file icon or execute the BAT file from a command line prompt. On repeating the test, change the user ID numbers to avoid in cache data. Reduce the 5 seconds think time or increase the number of I/Os per transaction from 20 to increase loading on the PC. Further start lines can be included for additional concurrent users, but DO NOT START TOO MANY TASKS - at some point the system is likely to crash. For more advanced tests with additional CPU time and memory demands or more randomised activities, see below. 1. INTRODUCTION The program was compiled from RAND.CPP by a Watcom C/C++ 10.5 compiler. It can be run from a local disk or a network drive. No manual parameters are provided. Run time parameters must be supplied either via a typed in command or a BAT file (CMD file OS/2). Three major parameter commands are available - Write, Read and Erase. Write generates a text file of the size and with the name given in the command line. Data transfer rate in megabytes per second is measured on writing. The program then carries out three read passes, each comprising at least 200 reads with the record number produced by a random number generator. On each pass, the average service time per I/O in milliseconds is calculated. Read provides facilities for transaction processing emulation. A think time parameter is included in the command line to pause program execution and control throughput in terms of transactions per second on a randomised basis. The next important parameter is a user ID number that defines the seed for the random number generator to ensure that different users generate varying sequences of I/O accesses on multitasking. Other parameters define the file to read, number of transactions to run and number of disk records to read per transaction. Actual records to read are generated randomly over the whole file. On Read, response time is measured for each transaction. This, user number, time of day and think time used are provided as output. Read has further optional parameters to specify additional memory and CPU time to use. The memory parameter specifies kilobytes. Run time memory capacity demands are generated by the C malloc function and this can take quite a long time. The CPU parameter is such that a value of 1 will generate an extra second of CPU activity on a 80486DX2/66 or about 0.25 seconds on a 100 MHz Pentium. Parameters for both Write and Read must include a log file name. On multitasking, limited testing has shown that the same log file can be specified for different users (but the log files can have different names). Advanced parameter commands can also be used on reading. These are Xread, Yread and Zread. Xread modifies the user number random number seed according to the time of day of the test. This is to avoid the same records being read if a test is repeated, where it is likely that the data will be in the memory based file cache and no reads from disk will take place. Yread randomises the number of disk I/Os and optional extra CPU time for each transaction. Randomisation is on a negative exponential basis, the usual assumption used for estimating service times on theoretical considerations. This produces a skewed distribution with many short service times and a number of long ones (5% greater than 3 times average). Zread executes the program using both Xread and Yread functions. Finally, the Erase parameter command can be used to delete the named data file. Versions provided are RANDNT95.EXE for Windows 95 and NT, RANDOS2.EXE for OS/2 and RANDDOS.EXE for MS DOS. The latter needs DOS4GW.EXE the protected run-time program. 2. DATA The program writes records of 4096 bytes or 256 per specified megabyte. The data for each record comprises the record number, text “Random disk test”, then filled with “ - this is data”. On reading, the system software appears to calculate which physical disk sectors to read, rather than reading the file sequentially. The C fseek command specifies the address of the first byte to be read and fgets defines the number of bytes to read. The Read command parameter can also specify a different text file, such as produced by DISKMBPS, the serial transfer rate measurement program. Although the data file is opened for reading text, it also appears to read binary files but some miss operation of false results might be expected. The read file name can also include drive and directory path details for reading from network drives or a CD ROM. The data and log files will be generated in the directory containing the EXE code. 3. EXAMPLE SCREEN DISPLAY AND LOG FILE OUTPUT Information displayed as the program is running is essentially the same as that sent to the log file. If the program runs successfully, any run time window used should close. The program has some built in checking facilities that display a message without closing the window. The following is an example of output using a command line of: RANDNT95 Write RandData.txt 40 15 LogFile.txt Disk test RAND in C++ Win95/NT Version 1 Mon Jul 7 12:53:44 1997 File written 40 Mbytes, 10240 records of 4096 bytes 17.45 seconds for writing file at 2.29 MB/s 12:54:01 Random reading records of 4096 bytes 6.18 seconds for 800 records at 7.72 ms 12:54:11 6.19 seconds for 800 records at 7.74 ms 12:54:17 6.20 seconds for 800 records at 7.75 ms 12:54:23 Using RANDNT95 Read RanData.txt 23 20 LogFile.txt 50 1 produces the following sort of output: Disk test RAND in C++ Win95/NT Version 1 Mon Jul 7 12:54:26 1997 User 23, 20 transactions, 50 I/O/transaction, 1 think secs Read RandData.txt, 40 MB, 10214 records, 0 CP load Random number seed 23 23 12:54:29 think 2 response 0.390 secs. load 0.000 23 12:54:29 think 0 response 0.530 secs. load 0.000 23 12:54:32 think 2 response 0.370 secs. load 0.000 23 12:54:34 think 2 response 0.430 secs. load 0.000 23 12:54:34 think 0 response 0.420 secs. load 0.000 23 12:54:36 think 1 response 0.410 secs. load 0.000 23 12:54:38 think 2 response 0.410 secs. load 0.000 23 12:54:40 think 1 response 0.350 secs. load 0.000 23 12:54:41 think 1 response 0.490 secs. load 0.000 23 12:54:42 think 0 response 0.480 secs. load 0.000 23 12:54:42 think 0 response 0.420 secs. load 0.000 23 12:54:44 think 1 response 0.540 secs. load 0.000 23 12:54:46 think 2 response 0.430 secs. load 0.000 23 12:54:47 think 1 response 0.390 secs. load 0.000 23 12:54:49 think 1 response 0.350 secs. load 0.000 23 12:54:49 think 0 response 0.410 secs. load 0.000 23 12:54:50 think 0 response 0.450 secs. load 0.000 23 12:54:51 think 1 response 0.420 secs. load 0.000 23 12:54:54 think 2 response 0.480 secs. load 0.000 23 12:54:55 think 1 response 0.410 secs. load 0.000 Using RANDNT95 ZRead RandData.txt 23 20 LogFile.txt 50 1 2000 2 provides the following results. This includes additional memory of 2000 Kbytes (note time taken) and CPU load of 2. The test was run on a 200 MHz Pentium Pro where the load of 2 equates to about 0.2 CPU seconds. Disk test RAND in C++ Win95/NT Version 1 Mon Jul 7 14:01:30 1997 23 14:01:42 start/allocate memory 12.550 secs. 2000 Kbytes User 23, 20 transactions, 50 I/O/transaction, 1 think secs Read RandData.txt, 40 MB, 10214 records, 2 CP load Random number seed 513 23 14:01:43 think 1 response 0.310 secs. load 0.333 23 14:01:44 think 1 response 0.060 secs. load 0.082 23 14:01:48 think 1 response 2.140 secs. load 2.402 23 14:01:49 think 1 response 0.680 secs. load 0.766 23 14:01:55 think 2 response 3.250 secs. load 3.812 23 14:01:56 think 0 response 0.970 secs. load 1.183 23 14:01:56 think 0 response 0.150 secs. load 0.186 23 14:01:59 think 2 response 1.200 secs. load 1.440 23 14:02:00 think 0 response 1.350 secs. load 1.685 23 14:02:02 think 1 response 0.300 secs. load 0.389 23 14:02:04 think 2 response 0.920 secs. load 1.157 23 14:02:06 think 2 response 0.010 secs. load 0.028 23 14:02:08 think 1 response 0.420 secs. load 0.483 23 14:02:09 think 0 response 1.520 secs. load 1.761 23 14:02:10 think 0 response 0.070 secs. load 0.110 23 14:02:11 think 1 response 0.480 secs. load 0.600 23 14:02:12 think 1 response 0.180 secs. load 0.223 23 14:02:13 think 1 response 0.300 secs. load 0.314 23 14:02:15 think 0 response 1.340 secs. load 1.601 23 14:02:16 think 0 response 0.790 secs. load 0.916 Note that in running so few transactions with randomisation, there can be a significant variance on average response and think times. 4. INPUT PARAMETERS The program requires run time parameters. If no parameters are included and the program is executed, a display is given identifying the main commands and format required. Full commands and formats are: RANDXXXX Write DF MB RS LF RANDXXXX Read DF UN NT LF IO TT [KM CP] RANDXXXX Xread DF UN NT LF IO TT [KM CP] RANDXXXX Yread DF UN NT LF IO TT [KM CP] RANDXXXX Zread DF UN NT LF IO TT [KM CP] RANDXXXX Erase DF RANDXXX - program name RANDNT95, RANDDOS or RANDOS2 Write or W - generates a data file and measures read average service times Read or R - reads the file measuring response time for a number of I/Os Xread or X - reads the file but uses new random numbers for each run Yread or Y - reads the file but randomises I/O and CPU loading Zread or Z - provides both Xread and Yread facilities Erase or R - erases the file DF - data file drive, path and name LF - log file drive, path and name. If the name 2logs is specified files Multi1.txt and Multi2.txt are used. The first contains details of run profiles used and the second transaction response times. These files will be in the same directory as the RANDXXX.EXE file. MB - megabytes to write RS - reading seconds, minimum 15 for 5 seconds per pass UN - user number for multitasking, used as seed for random numbers NT - number of transactions IO - number of disk records read per transaction TT - average think time - this can be zero KM - optional extra memory required CP - optional CPU loading factor 1 = 1 80486DX2/66 CPU second All parameters must be provided for Write and Read, except for the optional items. The following are examples of commands to run via RANDNT95.EXE but would be the same for other versions. The commands can be typed directly at a command prompt or included in a BAT file (CMD file OS/2). RANDNT95 Write DataFile.txt 60 30 Log.txt RANDNT95 Read Datafile.txt 101 100 Log.txt 30 0 RANDNT95 Read Datafile.txt 101 100 Log.txt 30 0 RANDNT95 Read Datafile.txt 102 100 Log.txt 30 0 RANDNT95 XRead Datafile.txt 102 100 Log.txt 30 0 RANDNT95 Erase DataFile.txt The first line generates a 60 MB file then has three read passes of at least 10 seconds each. The second command executes 100 transactions comprising 30 disk reads with no think time delays. Random number seed is 101. The third is a repeat of the second test. If the memory has sufficient capacity, the data will still be in the file cache and measured response times will be much faster. The fourth run is again the same but the different user number will generate a new sequence of random accesses. The last read uses the X… command to generate different random numbers for the same user ID used on the previous test. The final command erases the data file. All results will be save in Log.txt The following shows alternative reading functions and assume that the file has been written: RANDNT95 Read DataFile.txt 101 50 Log.txt 1 2 1024 20 This executes 50 transactions with an average of 2 second think time delay in between. Only one disk I/O is used but the high CP factor should ensure that the transactions are CPU limited. The program also collects an extra megabyte of memory. RANDNT95 YRead DataFile.txt 101 50 Log.txt 25 15 1 6 This example runs 50 transactions with an average of 25 I/Os and 6 CPU units, both being randomised (negative exponential). A nominal amount of extra memory is allocated. Average think time of 15 seconds is specified. This produces somewhat less than 4 transactions per minute, the sort of peak activity that might be generated by a user on a transaction processing system. 5. TESTING VIA A REMOTE CLIENT With the simple file server facilities provided by Operating Systems, the program can be run on remote clients but accessing data on the server. In this case, there may be no point in including parameters for extra memory and CPU time as it will only affect the client system. If the program is resident on the server, the following command sequence can be used to run it on the client using the server disk, assuming the data file is present and the drive letter is F: F: CD \TEST\ RANDNT95 Read RanData.txt 23 20 LogFile.txt 50 1 If the program is on the client, the following can be used to read an existing file on the server: RANDNT95 Read F:\TEST\RanData.txt 23 20 F:\TEST\LogFile.txt 50 1 Executing the following would download the program and run using the client disk (if the data file had been produced via Write): F:\TEST\RANDNT95 Read RanData.txt 23 20 LogFile.txt 50 1 One important consideration on running remote tests, particularly ones of short duration, is to ensure that time of day is the same on all PCs. If clocks are not synchronised, it can be difficult to identify periods to be used for analysing response times. 6. MULTITASKING Using Windows NT (and OS/2), multitasking can be initiated by opening a series of command windows, typing in the Read command in each but not entering until all windows are ready to go. A better approach is to use BAT files (CMD OS/2). The following example would run six versions of the program, effectively emulating 1 heavy and 5 light users, with randomised I/Os and CPU time - note different user ID numbers (this works with Windows 95 and NT): Start RANDNT95 YRead RanData.txt 101 100 LogFile.txt 50 15 10 2 Start RANDNT95 YRead RanData.txt 102 100 LogFile.txt 50 15 10 2 Start RANDNT95 YRead RanData.txt 103 100 LogFile.txt 50 15 10 2 Start RANDNT95 YRead RanData.txt 104 100 LogFile.txt 50 15 10 2 Start RANDNT95 YRead RanData.txt 105 100 LogFile.txt 50 15 10 2 Start RANDNT95 YRead RanData.txt 106 20 LogFile.txt 250 15 10 5 The batch with appropriate file paths (and different user IDs if run at the same time) can be initiated from a client processor (probably with no parameters for memory and CPU loading). 7. SPEED EXPECTATIONS Average service time for random access on a disk comprises the sum of the following: Average head movement time - this is normally specified for disk drives but represents the average over all tracks. Typical specifications are in the range 8 to 16 milliseconds. For files occupying, say, 10% of the tracks, the time might be half the values quoted. Latency - an average delay of half a revolution waiting for the data to reach the read head. Most disks used rotate at 3600 RPM or latency of 8.33 ms. Current disk drives are available that rotate at 4500 RPM (6.67 ms), 5400 RPM (5.55 ms), 7200 RPM (4.17 ms) and 10033 RPM (2.99 ms). High RPM disks often have the faster head movement times. Data transfer time - many disks used to transfer data at 1890 Kbytes per second based on 63 x 512 byte sectors per track and rotation speed of 3600 RPM. This would require 2.1 ms for the 4 KB blocks used in the program. Current disks with faster rotation time are also likely to have a greater number of sectors per track, reducing the transfer time to well under 1 ms. There may also be a small amount of CPU time after the transfer for data handling. All modern disk drives have a cache in the disk controller. When a read request is issued, the cache will be filled with sequential data. The first part of the data will also be transferred to memory. Following read requests can transfer data from the disk controller cache at high speed but, if the file size is large compared to that of the cache, there will be a low probability that the data will be cached with random access. The software may have read ahead facilities which also may not be of any benefit with random access. On the other hand, the memory based file cache can give rise to significant improvements in disk service times with random access (and may lead to misleading results if a test is repeated that accesses the same records). It all depends on the number of records that are actually accessed, although the case with this program, not necessarily the whole volume of data. A high cache hit rate can be apparent on reading during the Write phase of this program. On a PC, using NT, with 32 MB of memory, and writing a 40 MB file, the file cache grows to 16 MB. Thus, a high proportion of required records would be in the cache and reduce measured service times. Examining record numbers produced by the random number generator shows that on reading 2000, about 200 are repeated. The cache size was 8 MB+. With a larger number of reads and greater cache, the hit rate (for 40 MB file) would be higher than this 10% figure. The file cache size can be restricted, if required, by using the Read memory parameter. Running the DOS version, with no SmartDrive cache, will provide a more appropriate measurement of hardware speed. Results given above were produced on a 200 MHz Pentium Pro with 32 MB memory, a Quantum Fireball TM2110A 2.1 GB disk and running NT 4.0. The disk has a 128 KB cache and rotates at 4500 RPM (75 RPS or 6.67 ms latency). It has between 104 and 232 sectors per track or 52 to 116 KB. This leads to transfer rates between 3900 and 8700 KB per second. For head movement time, it has a specification of 3 ms minimum, 18 ms maximum, (18+3)/2 or 10.5 ms average. The disk is about half full so transfer rate would be around 6300 KB per second or around 0.63 ms for a 4 KB transfer. The 40 MB file would only occupy some 2% of the disk’s cylinders, so head movement time no greater than 4 ms might be expected. This would give a service time of 6.67 + O.63 + 4 = 11.3 ms, plus any additional non-overlapped CPU time. A series of measurements were made using the performance monitor to determine CPU and disk utilisation. This suggested that head movement time was about 5 ms on a 40 MB file up to 8 ms at 400 MB. CPU time on the Pentium Pro was around 2 ms per I/O of which 1.4 ms was non-overlapped (0.6 ms overlapped with data transfer). With data in the file cache, the CPU load was 1.2 ms. Running the DOS version appears to generate slightly more CPU time. Results of RANDDOS, when booted directly via MS DOS 6.2, gave the following disk service times on reading different sized files: File MB millisecs 1 11.0 2 11.4 5 12.4 10 13.1 40 14.3 100 15.6 200 16.5 At the smaller file sizes, a reasonable proportion of reads will not result in head movement, for example, 30% at 1 MB. Calculations suggest that latency + transfer time + non-overlapped CPU time is slightly greater than 9 milliseconds. On restricting accesses to a very small range of records (special version of program), the cache on the disk drive has some effect. With 8 records (32 KB), measured service time was 2.5 ms. On running RANDNT95 via NT 4.0 on this PC with 32 MB memory, the memory based file cache can grow to 20 MB to give a 50% cache hit rate on reading in the Write program mode with a 40 MB file. At 50% hit rate, an estimation of service time is 0.5 x 13.7 + 0.5 x 1.2 = 7.45 ms, similar to the results given earlier. Running independently from the write phase, with the cache size increasing as transactions are executed produces decreasing service times. A good test is to produce two files and read them with the following command to ensure no data in cache at the start. Using 1000 I/Os per transaction provides response times that can be interpreted as milliseconds per access (zero think time is also used). RANDNT95 Read RanData.txt 99 10 LogFile.txt 1000 0 RANDNT95 Read RanDat2.txt 91 10 LogFile.txt 1000 0 User 99, 10 transactions, 1000 I/O/transaction, 0 think secs Read RandData.txt, 40 MB, 10214 records, 0 CP load Random number seed 99 99 16:26:55 think 0 response 14.080 secs. load 0.000 99 16:27:09 think 0 response 13.110 secs. load 0.000 99 16:27:21 think 0 response 12.030 secs. load 0.000 99 16:27:32 think 0 response 11.790 secs. load 0.000 99 16:27:43 think 0 response 11.120 secs. load 0.000 99 16:27:54 think 0 response 10.520 secs. load 0.000 99 16:28:04 think 0 response 10.020 secs. load 0.000 99 16:28:14 think 0 response 10.130 secs. load 0.000 99 16:28:24 think 0 response 9.960 secs. load 0.000 99 16:28:34 think 0 response 9.470 secs. load 0.000 User 91, 10 transactions, 1000 I/O/transaction, 0 think secs Read RandDat2.txt, 100 MB, 25541 records, 0 CP load Random number seed 91 91 16:28:52 think 0 response 16.560 secs. load 0.000 91 16:29:07 think 0 response 15.440 secs. load 0.000 91 16:29:21 think 0 response 13.840 secs. load 0.000 91 16:29:35 think 0 response 13.600 secs. load 0.000 91 16:29:48 think 0 response 12.830 secs. load 0.000 91 16:30:00 think 0 response 12.590 secs. load 0.000 91 16:30:13 think 0 response 12.580 secs. load 0.000 91 16:30:25 think 0 response 12.550 secs. load 0.000 91 16:30:38 think 0 response 12.690 secs. load 0.000 91 16:30:50 think 0 response 12.540 secs. load 0.000 After 10 transactions on the 40 MB file, cache hit rate has been estimated as ending at 33%. With the 100 MB file the hit rate ends at 16%. 8. CPU TIME USED It is well known that CPU demands when using IDE based disk drives can be excessive compared with SCSI disks. With Windows NT 4.0 running on the Pentium Pro, the situation is not that bad. The two milliseconds per I/O CPU time mentioned above is derived from CPU utilisation of around 15% (compared with 89% disk utilisation). Using Windows 95, CPU utilisation is 100% continuously when running tests with zero think time on a 100 MHz Pentium. CPU time per I/O can be 16 ms when reading disk to 2.4 ms when data is cached. In this case, CPU time appears to be used when the disk heads are moving, unlike the test with NT. However, as suggested by the following results, the OS may be looping (executing instructions) rather than being in an idle mode whilst waiting for an interrupt. 9. SUGGESTED STANDARD THROUGHPUT TESTS Numerous different tests can be constructed using this program, including setting the parameters to represent a specific user workload. However, to provide the basis of comparison, it is useful to define some standard tests and to give a range of representative results. Most of the tests shown are of short duration and, with randomisation, cache hit variations and so on, produce results that are not necessarily accurate representations. For more accuracy, performance measurement should be based on transactions involving at least 1000 disk I/Os. The following configurations were used for the tests, RPM and average head movement times are shown for the disks: PPro - Dell XPS Pro, 200 MHz P6, 32 MB memory, 256K cache, Quantum Fireball 2.1 GB disk (4500 RPM, 10.5 ms) with 128K cache, 8 speed CD ROM ATAPI interface, Windows NT 4.0 Workstation. P100 - Escom 100 MHz Pentium, 16 MB memory, 256K cache, Seagate ST5850A 850 MB disk (3600 RPM, 11 ms) 256K cache, 4 speed CD ROM, Windows 95. 486 - Escom 80486DX2 66 MHz, 20 MB RAM, 128K cache, Seagate ST 3250A disk (3600 RPM, 14 ms ?), Windows 95. The following tests were run with zero think times to provide maximum utilisation on the PCs. The profile used of 20 I/Os and 1 CPU unit (1 second CPU time 486/66) per transaction is based on jobs used when benchmarking a series of UNIX systems in 1992. Results tables include the following: Elap secs - elapsed seconds for the whole batch to complete TPS - measured transactions per second CPU %Ut - CPU utilisation measured by the performance monitor CP sec/tran - CPU seconds per transaction. Where this is not shown, it can be calculated from CPU %Ut / 100 / TPS Min - minimum measured response time Avg - average measured response time Max - maximum measured response time 95% - 95 percentile (5% longer than time shown). This is included as it is often used when specifying requirements for an RDBMS system. With negative exponential distribution, 95 percentile is three times the average response time. The results were analysed by copying the log files into a spreadsheet. 9.1 DOS I/O Service Time - 40 MB file Command - randdos Write RandData.txt 40 15 DosLog.txt Write Service MB/sec Time ms 486 0.33 29.7 P100 0.90 18.4 PPro 2.40 14.3 9.2 Batch 1 to 4 Tasks CPU Load, Nearly No I/O, No Think Time Command - (4 task example), 100 transactions per task with 1 I/O, 10 KB extra memory and 1 CPU unit. start randnt95 Read RandData.txt 101 100 Multi.txt 1 0 10 1 start randnt95 Read RandData.txt 102 100 Multi.txt 1 0 10 1 start randnt95 Read RandData.txt 103 100 Multi.txt 1 0 10 1 start randnt95 Read RandData.txt 104 100 Multi.txt 1 0 10 1 This is essentially a CPU limited test. On the slower processors, average response times should be proportional to the number of tasks and TPS throughput should be constant. Using faster CPUs, the one I/O per transaction can be of significance. In this case, due to overlapped CPU and I/O transfers, throughput can increase slightly and response times may not increase in proportion to the number of tasks. As the multiple streams do not start and finish at the same time, some transactions are executed with less concurrency (see minimum times) to distort the results. Other variations are due timer resolution, system overheads and system software activity. PC No of Elap CPU CP sec Response Times Tasks secs TPS %Ut /tran Min Avg Max 486 1 101 0.99 100 1.01 0.96 1.00 1.47 2 206 0.97 100 1.03 0.99 2.01 3.99 4 419 0.95 100 1.05 0.95 3.88 6.98 P100 1 29 3.45 100 0.29 0.25 0.27 0.45 2 55 3.64 100 0.28 0.24 0.48 0.65 4 112 3.57 100 0.28 0.24 0.88 1.57 PPro 1 11 9.09 100 0.11 0.09 0.11 0.12 2 21 9.52 100 0.11 0.11 0.18 0.24 4 40 10.00 100 0.10 0.15 0.34 0.46 9.3 Batch 1 to 4 Tasks No CPU Load, I/O Load, No Think Time Command - First read 10 transactions with 1000 I/Os (about 40 MB) to set up file cache, then (4 task example), 100 transactions per task with 20 I/Os each. rem fill cache randnt95 Read RandData.txt 99 10 Multi.txt 1000 0 start randnt95 Read RandData.txt 101 100 Multi.txt 20 0 start randnt95 Read RandData.txt 102 100 Multi.txt 20 0 start randnt95 Read RandData.txt 103 100 Multi.txt 20 0 start randnt95 Read RandData.txt 104 100 Multi.txt 20 0 This is an I/O test with no additional CPU load other than program loop overheads and I/O driving. This produced the opposite effects to the last test, where faster processors are likely to have constant throughput (virtually 100% disk utilisation) and linear increases in response times in proportion to the number of tasks. With the Windows 95 based 486 and P100, the non linear response times and non constant throughput suggest that the 100% CPU utilisation is artificial. The CPUs are not fully saturated with a single task. PC No of Elap CPU Response Times Tasks secs TPS %Ut Min Avg 95% Max 486 1 66 1.52 100 0.48 0.64 0.81 1.18 2 83 2.41 100 0.44 0.78 1.23 1.69 4 153 2.61 100 0.48 1.26 2.10 3.20 P100 1 32 3.13 100 0.23 0.31 0.37 0.42 2 48 4.17 100 0.25 0.48 0.59 0.87 4 104 3.85 100 0.41 0.96 1.35 1.75 PPro 1 18 5.56 20 0.12 0.18 0.24 0.28 2 32 6.25 25 0.16 0.32 0.43 0.51 4 61 6.56 24 0.20 0.60 0.87 1.06 9.4 Batch 1 to 4 Tasks CPU Load, I/O Load, No Think Time Command - First read 10 transactions with 1000 I/Os (about 40 MB) to set up file cache, then (4 task example), 100 transactions per task with 20 I/Os each and 1 CPU unit. This is the same load as the last two tests combined. randnt95 Read RandData.txt 99 10 Multi.txt 1000 0 start randnt95 Read RandData.txt 101 100 Multi.txt 20 0 10 1 start randnt95 Read RandData.txt 102 100 Multi.txt 20 0 10 1 start randnt95 Read RandData.txt 103 100 Multi.txt 20 0 10 1 start randnt95 Read RandData.txt 104 100 Multi.txt 20 0 10 1 See comments on last two examples. Based on these, the time for each task is likely to be approximately as follows: Non-overlapped I/O seconds with CPU seconds some overlapped CPU time 486 1.00 0.50 P100 0.26 0.30 PPro 0.10 0.17 (0.04 CPU) PC No of Elap CPU Response Times Tasks secs TPS %Ut Min Avg 95% Max 486 1 152 0.66 100 1.33 1.50 1.64 3.26 2 255 0.78 100 2.21 2.53 2.88 3.35 4 519 0.77 100 2.09 5.02 3.35 8.81 P100 1 57 1.75 100 0.46 0.56 0.65 0.83 2 92 2.17 100 0.68 0.91 1.04 1.42 4 160 2.50 100 0.92 1.47 1.85 2.26 PPro 1 27 3.70 49 0.20 0.27 0.35 0.67 2 43 4.65 66 0.25 0.42 0.55 0.69 4 76 5.26 73 0.24 0.73 0.99 1.19 9.5 4 Tasks, Increasing Memory, CPU Load, I/O Load, No Think Time Command - First read 10 transactions with 1000 I/Os (about 40 MB) to set up file cache, then 100 transactions per task with 20 I/Os each and 1 CPU unit with 32 MB Pentium Pro (NT) memory demands increasing from 1024 KB to 6144 KB per task (last example shown). The second PC is the 16 MB Pentium (Win 95) with extra memory of 1024 KB and 2048 KB per task. randnt95 Read RandData.txt 99 10 Multi.txt 1000 0 start randnt95 Read RandData.txt 101 100 Multi.txt 20 0 6144 1 start randnt95 Read RandData.txt 102 100 Multi.txt 20 0 6144 1 start randnt95 Read RandData.txt 103 100 Multi.txt 20 0 6144 1 start randnt95 Read RandData.txt 104 100 Multi.txt 20 0 6144 1 The first effect of additional memory demands is to reduce the file cache size causing an increase in response times and a reduction in transactions per second. When the memory is fully used, file cache is reduced to 4 MB and tasks are affected by paging out and in, producing a few extended response times. Increasing task memory demands further, starts to cause a thrashing situation where a program is paged out before it has been completely paged in. PC Extra MB Elap CPU Response Times Total secs TPS %Ut Min Avg 95% Max PPro 4 84 4.76 74 0.47 0.80 1.06 1.46 8 94 4.26 72 0.57 0.92 1.18 1.49 12 101 3.96 72 0.60 0.99 1.23 1.58 16 119 3.36 69 0.63 1.14 1.44 1.91 20 184 2.17 55 0.36 1.60 2.18 57.13 24 317 1.26 36 0.40 2.97 2.12 114.67 P100 4 190 2.11 100 0.59 1.74 2.21 2.79 8 268 1.49 100 0.55 2.53 2.59 29.96 10. TRANSACTION PROCESSING EMULATION The difference with these tests from the above in that think time is used which represents idle time between transactions. Think time is randomised by the program. In most of the tests, an average think time of five seconds is used, representing up to 12 transactions per minute per user. With real world applications and the sort of transaction represented by the tests, a rate of two to four transactions per minute per user would be more representative. The five seconds was used due to memory limitations in running a large number of tasks concurrently (and there must be a limit on the number of windows that can be opened). Besides random think time, the Yread command is used to randomise CPU time and the number of I/Os used. As indicated earlier, the random numbers are based on a negative exponential distribution, where the 95 percentile response time is about three times the average. Loading effects increase this ratio slightly. 10.1 Multiple Tasks, Random Service and Think Times Command - First read 10 transactions with 1000 I/Os (about 40 MB) to set up file cache, then 100 transactions per task with 20 I/Os each, 5 seconds think time and 1 CPU unit, using Yread to randomise CPU and I/O load. randnt95 Read RandData.txt 99 10 Multi1.txt 1000 0 start randnt95 YRead RandData.txt 101 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 102 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 103 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 104 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 105 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 106 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 107 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 108 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 109 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 110 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 111 100 2logs 20 5 10 1 start randnt95 YRead RandData.txt 112 100 2logs 20 5 10 1 Pentium to 124 Pentium Pro to 136 PC No of Elap CPU Response Times Tasks secs TPS %Ut Min Avg 95% Max 486 4 933 0.43 82 0.01 3.71 11.98 20.81 5 1018 0.49 89 0.01 4.55 13.57 37.51 P100 6 613 0.98 60 0.00 0.82 2.76 6.33 12 691 1.74 88 0.00 1.39 4.75 12.12 18 827 2.18 98 0.00 2.29 7.58 23.27 24 1059 2.27 100 0.00 4.25 14.01 36.24 PPro 18 608 2.96 42 0.00 0.53 1.67 4.85 24 623 3.85 54 0.00 0.67 2.22 9.69 30 660 4.64 63 0.00 0.91 3.04 8.94 36 690 5.16 73 0.00 1.39 4.51 16.11 48* 624 3.85 58 0.00 1.04 3.36 11.60 * 10 seconds think time, 50 transactions per task (same load as 24 tasks with 5 seconds think and 100 transactions, but page faults per second increased from 78 to 223 due to insufficient memory capacity or poor memory management. In both cases, file cache was 13 MB). 10.2 Different transaction profiles The following are results from the Pentium Pro with 30 tasks as above plus two with 60 IOs and CPU factor of 3. Think time for the latter is also reduced to 4 seconds to decrease the overall time for the transactions, compensating for the heavier usage. The purpose of this test is to show that, as in real benchmarks, users can be executing different transactions. One problem is that the number of samples of the second type is relatively small. This may lead to wider variations if the test is repeated. randnt95 Read RandData.txt 99 10 Multi1.txt 1000 0 start randnt95 YRead RandData.txt 101 100 2logs 20 5 10 1 to 130 start randnt95 YRead RandData.txt 201 100 2logs 60 4 10 3 start randnt95 YRead RandData.txt 202 100 2logs 60 4 10 3 Trans- No of Elap CPU Response Times actions Tasks secs TPS %Ut Min Avg 95% Max 3000 30 666 4.50 71 0.00 1.18 4.01 14.81 200 2 666 0.29 0.00 2.97 9.07 21.43 10.3 Remote Access For this test, the following script was run on the Pentium based PC, accessing the disk on the Pentium Pro via a LAN with 6 concurrent tasks. Pentium randnt95 Read f:\test\RandData.txt 199 5 Multi1.txt 1000 0 pause start randnt95 YRead f:\test\RandData.txt 201 100 2logs 20 5 10 1 to 206 The above was repeated at the same time as the following normal 30 task PPro test. The pause is included in order to synchronise starting the main reading phases from the two PCs. Pause suspends processing of the batch program and prompts the user to press any key to continue. PPro randnt95 Read RandData.txt 99 10 Multi1.txt 1000 0 pause start randnt95 YRead RandData.txt 101 100 2logs 20 5 10 1 to 130 The results from the first run on the P100 only can be compared with those in test 10.1 and show similar TPS throughput and response times. CPU utilisation on the P100 is lower but there is the additional 7% load on the PPro for reading disk and transmitting data over the network. P100 CPU utilisation is higher for the second part of the test, again suggesting that the OS is using CPU cycles looping around waiting for I/O. Throughput is also lower and response times worse as the PPro is more heavily loaded. PPro TPS and response times are similar to example 10.2. PC No of Elap CPU Response Times Tasks secs TPS %Ut Min Avg 95% Max P100 6 622 0.96 45 0.00 0.83 2.70 6.09 PPro Ut 7 P100 6 698 0.86 70 0.00 1.59 4.92 16.44 PPro 30 668 4.49 68 0.00 1.17 3.95 16.00 11. Performance Monitoring Performance monitoring via Windows 95 is carried out by System Monitor (SYSMON.EXE). Viewing Line Charts, Options, Chart was set to 30 seconds sampling and CPU utilisation noted as the tests were running. Other useful charts include: File System Disk reads per second Bytes read per second Memory Manager Disk cache size Free memory Number of page faults per second IPX Protocol Network packets per second MS Network Client Bytes per second MS Network Server Bytes per second All the above and more are available for Windows NT 4.0 Workstation monitoring via the standard PERFMON.EXE. CPU utilisation and memory utilisation can also be viewed via Windows NT Task Manager. This also provides details of CPU and memory use for each task. Logging and report generation over selected periods is also available via PERFMON.EXE. For these tests PerfLog from the Resource Kit was used. This is accessible for set up and running via PDLCNFIG.EXE. Results are produced that can be read directly into a spreadsheet and include a time stamp (with the format 07/26/1997 15:30:17.193). Sample results for test 10.3 are: CPU Disk File Cache Page %CPU Xfer Bytes Qlen Sec/ %Disk Bytes Fault Read Fault /sec /sec xfer /sec /sec /sec 67.9 64.6 287517 4.50 0.063 408.3 13467462 74.2 64.1 105.3 The disk statistics appear to be about as expected when compared to measured response time and throughput. However, the %Disk utilisation figure is impossible and appears to reflect queuing time. Based on the queue length, utilisation should be in the 80% to 90% range. With 20 I/Os per transaction and 0.063 seconds per transfer, average disk response time is 1.26 seconds (overall average 1.24 seconds measured by test). At 5.35 TPS and 20 I/Os, the test suggests 107 I/Os per second. With a 13 MB cache for a 40 MB file, cache miss rate could be 27/44 = 0.61. Multiplying this by 107 gives 65 physical I/Os per second (measured 64.6). 12. Result Analysis When running multiple tasks, the program appears to work correctly using a common log file (if there are problems use different ones). The best way to analyse the results is to read the log file into a spreadsheet. This is particularly easy using the 2logs method with just transaction results in the log. Using Excel, the log text file can be opened and converted to columns (Data, Text to Columns). The data can then be sorted by date/time (Data, Sort) to determine overall running times or first by user number when considering different types of transactions. Average, Minimum, Maximum and Percentile figures can be calculated using standard functions.