---------------------------------------------------------------------
QLife 1.3                                           November 11, 1992
David Stafford
---------------------------------------------------------------------

QLife is my entry in the PC-Technique magazine optimization challenge.
QLife is written for Borland C 3.1 but it shouldn't be too difficult to
convert it to work with Microsoft C.  To build QLife run the BUILD.BAT
batch file with the size of the life grid on the command line (see
below).  The command-line options are:

WIDTH 32   - Sets the width of the life grid to 96 cells (divided by 3).
HEIGHT 96  - Sets the height of the life grid to 96 cells.
NOCOUNTER  - Turns off the generation counter.  (optional)
NODRAW     - Turns off drawing of the cell map. (optional)
GEN 1000   - Calculates 1000 generations.       (optional)

These MUST be in UPPER CASE.

For example, the minimum you really need is "WIDTH 40 HEIGHT 120".
I used "WIDTH 46 HEIGHT 138 NOCOUNTER NODRAW GEN 7000" during testing.

If you have selected the GEN option you will have to press a key to
exit QLife when it is finished.  This is so I could visually compare the
result of N generations under QLife with N generations under Abrash's
original life program.  You should be aware that the program from the
magazine contains a small bug which may make it appear that they do not
generate identical results.  The original program does not display a
cell until it changes so if a cell is alive on the first generation and
never dies then it will never be displayed.  This bug is not present in
QLife.

You should have no touble running QLife with cell grids up to 210x200.

You MUST have a VGA and AT LEAST a 386 to run QLife.  The 386 features
which it uses are not integral to the algorithm (they're a convenience for
the code) so feel free to modify QLife to run on lesser CPUs if you wish.
QLife works best if you have a large CPU cache (256k is recommended).

The subdirectories contain compiled versions of both the original life
program from PC-Techniques and QLife.  The programs have been modified
slightly to display the generation count (and check for a keypress) only
once per 100 generations.  This significantly reduces time spent in I/O.
Each directory contains a program for a different grid size.  There is
only one program in the 201x200 directory because the original program
cannot work with a grid of that size.

---------------------------------------------------------------------
How it works
---------------------------------------------------------------------

Each three cells in the life grid are packed into two bytes:

OABCabcX  XXYYYZZZ

O   : 0 if an internal cell, 1 if on an edge (must handle wrapping)
ABC : the life/death cell states for the next generation
abc : the life/death cell states for the current generation
XXX : neighbor count for cell 'A'
YYY : neighbor count for cell 'B'
ZZZ : neighbor count for cell 'C'

So, it is convenient if the width of the cell array is an even multiple
of three.  There's nothing in the algorithm which prevents it from
supporting any arbitrary size but the code is a bit simpler this way.
So if you want a 200x200 grid I recommend just using a 201x200 grid and
be happy with the extra free column.  Otherwise the edge wrapping code
gets more complex.

Since every cell has from zero to eight neighbors you may be wondering
how I can manage to keep track of them with only three bits.  Each cell
really has only a maximum of seven neighbors since we only need to keep
track of neighbors OUTSIDE of the current cell word.  That is- if cell
'B' changes state then we don't need to reflect this in the neighbor
counts of cells 'A' and 'C'.  Updating is made a little faster.

The basic idea is to maintain a "change list".  This is an array of
pointers into the cell array.  Each element points to a word which
changes in the next generation.  This way we don't have to waste time
scanning every cell since most of them do not change.  For a 201x200
cell array the change list will be, at maximum, 26,800 bytes (plus one
more word to terminate the list).  That's only if EVERY cell changes
in one generation.  In practice it is much less.

Two passes are made through the change list.  The first pass updates the
cell display on the screen, corrects the life/death status of each cell
and updates the neighbor counts for the adjacent cells.  There are some
efficiencies gained by using cell triples rather than individual cells
since we usually don't need to examine all eight neighbors.

The second pass checks (and possibly resets) the life/death status for
the cells and their neighbors to build the change list for the next
generation.

Processing each word is a little complex but very fast.  A 64k block of
code exists with routines on each 256-byte boundary.  Generally speaking,
the entry point corresponds to the high byte of the cell word.  This byte
contains the life/death values and a bit to indicate if this is an edge
condition.  During the first pass we take the high byte of the cell, AND
it with 0xFE and jump to that address.  During the second pass we take
the high byte, OR it with 0x01 and jump to that address.

Determining which changes must be made to a cell triple is easy and
surprisingly quick.  There's no counting!  Instead, I use a 64k lookup
table indexed by the cell triple itself.  The value is equal to what the
high byte should be in the next generation.  If this value is equal to
the current high byte then no changes are necessary to the cell.
Otherwise it is placed in the change list.  Look at the code in the
Test() and Fix() functions to see how this is done.

How each segment is used
------------------------
CS : 64k code (table).
DS : DGROUP (1st pass).  64k cell life/death classification table (2nd pass).
ES : Both change lists (so we can use STOSW).
SS : DGROUP.  The life cell grid and row/column table.  Indexed via BP.
FS : Video segment.
GS : Unused.

This code can be taken a little further but I've reached the point of
diminishing returns (and time).  Plus, I've already bumped up against
the 400 line limit and have had to squeeze things into an uncomfortable
state.  One obvious improvement would be to replace the calculation to
determine the screen offset with a table lookup.  The contest isn't
timing the I/O so I've left this optimization out.