::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Starfields, by S. Judd If you've ever played Elite or Star Raiders you've experienced a starfield. The idea is to use stars to give a sense of motion while navigating through the universe. In this article we'll derive a simple algorithm for making a starfield -- the algorithm I used in Cool World. As usual, a good way to start is with a little thinking -- in this case, we first need to figure out what the starfield problem even is! Consider the motion of the stars. As we move forwards, the stars should somehow move past us -- over us, to the side of us, etc. (A star coming straight at us wouldn't appear to move at all). And something that moves needs a starting place and an ending place. Certainly they end at the screen boundaries, and since anything really far away looks like it's right in front of us they must start at (or near) the center of the screen. So far so good: stars start near the center of the screen and move outwards to the edges. What kind of path do they follow? Well it can't be a curve, right? If it were a curve, then all stars would have to curve in exactly the same way, since there is a rotation symmetry (just tilting your head won't alter the shape of the path of the stars). So it has to be a straight line. Since stars start at the center of the screen and move outwards, we know that, wherever a star might be on the screen right now, it started in the center. So, to get the path of the star's motion, we simply draw a line from the center of the screen through the star's current location, and move the star along that line. Drawing lines is easy enough. If the center of the screen is located at (xc, yc) then the equation of a line from the center to some point (x0, y0) is just (x,y) = (xc, yc) + t*(x0-xc, y0-yc) You should read a vector equation like the above as for t=blah to wherever step 0.0001 x = xc + t*(x0-xc) y = yc + t*(y0-yc) plot(x,y) next and you can see that when t=0 (x,y) = (xc, yc) i.e. the center of the screen and when t=1 (x,y) = (x0, y0) i.e. the current location. Thus, when t goes from 0 to 1 we hit all points in-between, and when t is a very small number (like 0.0001) we basically move from (xc,yc) to the point on the line right next to (xc,yc). Great, now we know the path the stars take. But _how_ do they move along that path? We know from experience that things which are far away appear to move slowly, but when we're up close they move really fast (think of a jet plane, or the moon). And we know that in this problem that stars which are far away are near the center of the screen, and stars which are nearby are out near the edges. So, stars near the center of the screen ought to move slowly, and as they move outwards they ought to increase in speed. Well, that's easy enough to do. We already have an easy measure of how far away we are from the center: dx = x0-xc dy = y0-yc But this is the quantity we need to compute to draw the line. All we have to do is move some fraction of (dx,dy) from our current point (x0,y0): x = x0 + dx/8 ;Note that we started at x0, not xc y = y0 + dy/8 where I just divided by 8 for the heck of it. Dividing by a larger number will result in slower motion, and dividing by a smaller number will result in faster motion along the line. But the important thing to notive in the above equations is that when a star is far away from the center of the screen, dx and dy are large and the stars move faster. Well alrighty then. Now we have an algorithm: draw some stars on the screen for each star, located at (x,y): erase old star compute dx = x-xc and dy = y-yc x = x + dx*velocity y = y + dy*velocity plot (x,y) keep on going! where velocity is some number like 1/8 or 1/2 or 0.14159 or whatever. This is enough to move us forwards (and backwards, if a negative value of velocity is used). What about moving up and down, or sideways? What about rotating? First and foremost, remember that, as we move forwards, stars always move in the same way: draw a straight line from the origin through the star, and move along that path. And that's the case no matter where the star is located. This means that to rotate or move sideways, we simply move the position of the stars. Then using the above algorithm they will just move outwards from the center through their new position. In other words, rotations and translations are pretty easy. Sideways translations are easy: just change the x-coordinates (for side to side translations) or y-coordinates (for up and down translations). And rotations are done in the usual way: x = x*cos(t) - y*sin(t) y = x*sin(t) + y*cos(t) where t is some rotation angle. Note that you can think of sideways motion as moving the center of the screen (like from 160,100 to 180,94). Finally, what happens when stars move off the screen? Why, just plop a new star down at a random location, and propagate it along just like all the others. If we're moving forwards, stars move off the screen when they hit the edges. If we're moving backwards, they are gone once they get near enough to the center of the screen. Now it's time for a simple program which implements these ideas. It is in BASIC, and uses BLARG (available in the fridge) to do the graphics. Yep, a SuperCPU comes in really handy for this one. Rotations are also left out, and I put little effort into fixing up some of the limitations of the simple algorithm (it helps to see them!). For a complete ML implementation see the Cool World source code. 10 rem starfield 15 mode16:gron16 20 dim x(100),y(100):a=rnd(ti) 25 xc=160:yc=100:n=12:v=1/8 30 for i=1 to n:gosub200:next 40 : 50 for i=1 to n 55 color0:plot x(i),y(i):color 1 60 dx=x(i)-xc:dy=y(i)-yc 65 x1=x(i)+v*dx+x0:y1=y(i)+v*dy+y0 67 x2=abs(x1-x(i)):y2=abs(y1-y(i)):if (x2+y2<3*abs(v)) then gosub 200:goto 60 70 if (x1<0) or (y1<0) or (x1>319) or (y1>199) then gosub 200:dx=0:dy=0:goto65 75 plot x1,y1:x(i)=x1:y(i)=y1 80 next 90 get a$ 100 if a$=";" then x0=x0-3:goto 40 110 if a$=":" then x0=x0+3:goto 40 120 if a$="@" then y0=y0-3:goto 40 130 if a$="/" then y0=y0+3:goto 40 133 if a$="a" then v=v+1/32 135 if a$="z" then v=v-1/32 140 if a$<>"q" then 40 150 groff:stop 200 x(i)=280*rnd(1)+20:y(i)=170*rnd(1)+15:return Line 200 just plops a new star down at a random position between x=20..300 and y=15..185. Lines 100-140 just let you "navigate" and change the velocity. The main loop begins at line 50: 55 erase old star 60 compute dx and dy 65 advance the star outward from the center of the screen 67 check if the star is too close to the center of the screen (in case moving backwards) 70 if star has moved off the screen, then make a new star And that's the basic idea. Easy! It's also easy to make further modifications -- perhaps some stars could move faster than others, some stars could be larger that others, or different colors, and so on. Starfields are fast, easy to implement, and make a nifty addition to many types of programs. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 3D Graphics for the Masses: lib3d and Cool World -------------------------- by Stephen L. Judd sjudd@nwu.edu Well folks, it's time once again for some 3D graphics. This will, I think, be more or less the final word on the subject, at least from me! I'm pooped, and I think these routines pretty much push these algorithms as far as they're going to go. And there are so many other interesting things to move on to! These routines, not to mention this article, are the codification of all those years of algorithms and derivations and everything else. Those past efforts created working prototypes; this one is the production model. Nate Dannenberg suggested that this be done, and who am I to disobey? Besides, after a break of a few years it's going to be fun to rederive all the equations and algorithms from scratch, and do a "for-real" implementation, improving the good ideas and fixing the bad. Right? Right. So that's what I did, and that's what this article does. The first section of the article summarizes the basics of 3D graphics: projections and rotations. Since in my experience many people don't remember their high school math, I've also covered the basic mathematical tools needed, like trigonometry and linear algebra, and other related issues. The next section covers constructing a 3D world: representing objects, navigating through the world, things like that. The third section covers the main library routines and their implementation, including lots of code disassembly. The fourth section covers Cool World, a program which is a demonstration of the 3D library. The final section just summarizes the 3d library routines, calling conventions, parameters used, things like that. Since this article is enormous, I have left out the complete source code; the truly motivated can visit http://stratus.esam.nwu.edu/~judd/fridge/ to check out the full source for both Cool World and the 3d library. The binaries are also present there. By this time you may have asked, "What exactly _is_ this 3d library?" Glad you asked. It is a set of routines that are important for doing 3D graphics: rotations, projections, drawing polygons, that sort of thing. The routines are very flexible, compact, and extremely fast. They can draw to any VIC bank, and still leave enough room for eight sprite definitions. What the routines are not is a "3D world construction kit". A program has to be written which utilizes the routines. Which leads to Cool World. CW is, simply, a program which demonstrates using the 3D library. It is very large. It has some seventeen objects in it -- the initial tetrahedron is the 'center'; straight ahead of it is squaresville; to the left is a bunch of stars; straight down is the Cobra Mk III; straight back is a line of tetrahedrons. It also has some other fun stuff like the starfield and a tune. It doesn't have Kim Basinger, unfortunately. Oh yeah: the _whole point_ of the 3d library is for you to use it in your own programs! You can sell those programs if you like, too. Doesn't bother me; in fact I would be quite flattered to see it used in some cool and popular program. Why re-invent the wheel when the routines are already written? Use the 3d library. Live the 3d library. Do something interesting with it. If nobody uses it, it might as well have never been written, right? Right. You know you want to use it. So use it! But before you use it, you better understand it. So let's start at the very beginning (a very good place to start). --------- Section 1: 3D Basics and review --------- This is just going to be a quick summary. For more detail you ought to go back to some of the earlier articles. First we need to decide on a coordinate system. A convenient system for dealing with the computer is to have the x-axis point to the right and the y-axis to point down the screen -- the usual coordinate system with x=0 y=0 being in the upper-left corner of the screen. The z-axis can then point into the monitor (i.e. z=20 is behind the monitor screen somewhere, and z=-20 is somewhere behind where you're sitting); this keeps the coordinate system right-handed. The next thing is to figure out an equation for projecting from three dimensions into two dimensions. In other words: how to paint a picture. One of the great breakthroughs for art at the beginning of the Renaissance was the understanding of perspecitve. The first thing to notice is that far-away objects look smaller. The next thing to observe is that straight lines seem to meet in the middle -- like looking down a long road or sidewalk or railroad tracks. That's perspective: far-off objects get smaller _towards the center of vision_. Well that's an easy equation: just take the coordinate, and divide by the distance away from us. If we're looking down the z-axis, x' = x/z y' = y/z gives us a pair of projected points. Objects which are far away have a large value for z, which means x' and y' will be proportionally smaller. For really large z, they go to zero -- they go to the center of our vision. Now, how can we make objects appear larger? One way is to stand closer to them -- that changes the perspective. Another way is to magnify them, with a lens. This is how a telescope works, and also how the old Mark-I eyeball works as well. Whereas perspective makes far-off points behave differently than near ones, magnification just expands all points equally. And that's exactly how to put it in the equation: x' = d * x/z y' = d * y/z where d is some constant magnification factor (a number, like 12). And the above are the projection equations. A very important skill is the ability to 'read' an equation. The above equation takes a point, divides by the distance away from us, and then multiplies by the magnification constant. It says that far-off objects will be small, and that all objects are magnified equally. It also implies that the z-coordinates better all be positive or all negative. If an object has some positive z-coordinates and some negative, that means that some of its points are in front of us and some are behind us. In other words, that we are standing inside the object! A proper derivation is as follows: consider a pinhole camera. A light ray bounces off some object at a point (x,y,z) and passes through the pinhole, located at the origin (0,0,0). We then place a photographic plate parallel to the x-y plane, at z'=d, and figure out where the light ray hits the plate. The equation of the light ray is the line (x', y', z') = t*(x,y,z) and it hits the plate when z'=d, which happens when tz = d so when t = d/z. Thus the coordinates on the plate (the film) are x' = d/z * x y' = d/z * y which gives the earlier equations. Moving the film up and down will magnify the image, corresponding to changing the value of d. If d is negative, the image will be inverted. Positive d corresponds to having the film between the pinhole lens and the object, but mathematically it gives the same answer without inverting the object (since x' and y' won't have a minus sign in front of them). Representing 3D objects ----------------------- The important thing to recognize from the above is that straight lines in 3D will still be straight lines in 2D -- the projection doesn't turn straight lines into curves or anything like that. So, to project a straight line from 3D into 2D all that is needed are the _endpoints_ of the line. Project those two endpoints from 3D into 2D, and then all that is needed is to draw a line between the two _projected_ points. Most of the objects we will be dealing with will be made up of a series of flat plates, because these are easier to do calculations with (ever noticed the main difference between the stealth fighter and the stealth bomber?). In other words: polygons. These polygons are just flat objects with several straight sides. All we need are the vertices of each polygon (the endpoints of each of the lines); from those we can reconstruct all the in-between points. A very simple example is a cube. It has six faces. The eight corners of a cube might be located at (+/-1, +/-1, +/-1). Project those eight points and connect the projected points in the right way, and Viola! Violin! 3D graphics. The next thing we will want to do is rotate an object. For this we need a little trig, and it's very helpful to know a little about linear algebra and vectors. What's that? You don't remember this stuff? For shame! But we can fix that up in no time. Trig review in two paragraphs. ----------- Take a look at a triangle sometime. No matter how large or small you draw a particular triangle, it still looks the same because all of the sides and angles are in proportion to one another. Now fix one of the angles at 90 degrees, to form a right triangle. The minute you choose one of the other angles, the opposite angle is determined (since the angles have to add up to 180 degrees). And when you know all the angles you know the _ratios_ of the sides to each other -- you don't know the actual length of the sides, but you know how the triangle looks and hence its proportions. And that is pretty much all of trig. Draw a right triangle, and pick an angle (call that angle theta). The hypoteneuse is the longest side of a triangle, and hence the side opposite the 90 degree angle. The 'adjacent' side is the side which touches the angle theta; the 'opposite' side is the side opposite theta (imagine that!). We _define_ sine, cosine, and tangent as cos(theta) = adjacent/hypoteneuse sin(theta) = opposite/hypoteneuse tan(theta) = sin(theta)/cos(theta) = opposite/adjacent and that's all of trig, right there. Everything else comes from those three definitions. If you can't remember them, just remember the famous Indian Chief SOHCAHTOA, who fought so bravely for the cause of his people and mathematicians everywhere (SOH: sin-opposite-hyp; CAH: cos-adjacent-hyp; TOA: tan-opposite-adjacent). Polar coordinates ----------------- Now we've got some tools to do some useful calculations with, once we remember the Pythagorean theorem, a^2 + b^2 = c^2, where c = the hypoteneuse and a,b are the sides of a right triangle. Before doing rotations it is helpful to understand polar coordinates. Draw a set of normal coordinate axis on a piece of paper, and draw a point somewhere (at 3,2 say). Label this point x,y and draw a line from the origin (0,0) to that point, and call its length "r". Label the angle between that line and the positive x-axis as "theta", and draw a line straight down from x,y to the x-axis. Whaddaya know? It's a right triangle. The length of one side is x, the length of the other side is y, and the length of the hypoteneuse is r. And the trig definitions tell us that cos(theta) = x/r sin(theta) = y/r tan(theta) = y/x. So now we can define sin, cos, and tan for all angles theta just by drawing a little picture. Two coordinates, x and y, are used to locate the point. Another two coordinates in the picture are "r" and "theta", and they are just as good for locating where a point is. In fact, the above equations give the x and y coordinates of a point (r,theta): x = r * cos(theta) y = r * sin(theta) They also give us the r and theta coordinates of a point (x,y): tan(theta) = y/x r^2 = x^2 + y^2 You may also have noticed that the trig formula cos(theta)^2 + sin(theta)^2 = 1 follows from the above trig definitions, since x^2 + y^2 = r^2. In the Cartesian (or rectangular) system there are two coordinates, x and y. The equation x=constant is a vertical line and the equation y=constant is a horizontal line, so this is a system of straight lines which cross each other at right angles. In polar coordinates the equation theta=constant is a radial line, and r=constant gives a circle (constant radius, you know). So polar coordinates consists of circles which intersect radial lines (note that they also intersect at right angles). And the equations above tell how to convert between the two systems. There are many other coordinate systems, btw. There are elliptical coordinates and parabolic coordinates and other strange systems. Why bother? For one thing, if you're solving an equation on a round plate it's awfully convenient to use polar coordinates. For another, we can now do some rotations. (And for another, you can draw cool graphs like r=cos(3*theta)). Rotations --------- Start with a point (r,theta) and rotate it to a new point (r,theta+s) i.e. rotate by an amount s. The original coordinate was x = r*cos(theta) y = r*sin(theta) and the new coordinate is x' = r*cos(theta+s) y = r*sin(theta+s) so by using the trig identities for cos(a+b) and sin(a+b) we arrive at the simple expressions x' = x*cos(s) - y*sin(s) y' = x*sin(s) + y*cos(s) for the rotated coordinates x' and y' in terms of the original coordinates x and y. The above is a two-dimensional rotation. Three-dimensional rotation is done with a series of two-dimensional rotations. The above rotates x and y, but keeps z fixed (I don't see a z anywhere in there, at least). In other words, it rotates about the z-axis. A rotation about the x-axis looks like x' = x*cos(s) - z*sin(s) z' = x*sin(s) + z*cos(s) and a rotation about the y-axis looks like x' = x*cos(s) + z*sin(s) z' = -x*sin(s) + z*cos(s) (the opposite minus sign just comes from the orientation of the y-axis). Rotations in three dimensions do NOT commute. Just by trying with a book it is very easy to see that a rotation of 45 degrees about the x-axis and then 45-degrees about the y-axis gives a very different result from first rotating about the y-axis and then the x-axis. To really make rotations powerful we need a little linear algebra. But first... Radians ("God's units") ------- Just for completeness... what is an angle, really? It's not a length, it's not a direction... what is it? And why are there 360 "angles" in a circle? The answer to the second is "because the ancient Babylonians used base 60 for all their calculations." In other words, degrees are totally arbitrary. There could just as easily be 1000 degrees in a circle. So, what is an angle? Think about a circle. It has two lengths associated with it: the length "across" (the diameter D) and the length "around" (the circumference C). And, like triangles, no matter how large or small the circle is drawn, those lengths stay in proportion. That is, C/D = constant That constant is, of course, pi = 3.1415926... This then gives the equation for the circumference of a circle as C = pi*D = 2*pi*r where r=radius. But what is an angle? Draw two radii in the circle. We _define_ the angle between those two radii as the length of the subtended circumference divided by the radius: angle = length / radius. So again, no matter what the _size_ of the circle is the _angle_ stays the same. The length is just a fraction of the circumference, which is 2*pi*radius; this means that the angle is just a fraction of 2*pi. These are radians. There are 2*pi of them in a circle. A quarter of a circle (90 degrees) is just 1/4 (2*pi) = pi/2 radians. An eighth is pi/4 radians. And so on. These are of course very natural units for an angle. The above definition has angle equal to a length divided by a length; radians have no dimension (whereas degrees are a dimension, just like feet or pounds or seconds are). Degrees are useful for talking and writing, but radians are what is needed for calculations. Vectors and Linear Algebra -------------------------- A vector (in three dimensions) is simply a line drawn from the origin (0,0,0) to some point (x,y,z). Since they always emanate from the origin, it is correct to refer to "the vector (x,y,z)". The important thing is that it has both length _and_ direction. A physical example is the difference between velocity and speed. Speed is just a quantity: for example, "120 Miles per hour". Velocity, on the other hand, has _direction_ -- you might be going straight up, or straight down, or turning around a curve, etc. and the _length_ of the velocity vector gives the speed: 120 MPH. But all we need to worry about here is the geometric meaning, and the difference between the _point_ (Px,Py,Pz) and the _vector_ (Px, Py, Pz). The point P is just a point, but geometrically the vector P is a line extending *from* the origin *to* the point P -- it has a length, and a direction: it points in the direction of (Px, Py, Pz). We will be using vectors for rotations, and to figure out what direction something points, and all sorts of other stuff. The dimension of a vector is the number of elements in that vector. Let P be a vector. A 2D vector might be P=(Px,Py). A three dimensional vector example is P=(Px,Py,Pz). P=(p1,p2,p3,p4,p5,p6) would be a six-dimensional vector. So an n-dimensional vector is just a list of n independent quantities. We'll just be dealing with two and three dimensional vectors here, though, so when you see a sentence like "The vector v1" just think of (v1x, v1y, v1z). The length of a vector is again given by Pythagorus: r^2 = Px^2 + Py^2 + Pz^2 + ... i.e. the sum of the squares of all the elements. This is a good calculation to avoid in algorithms, since it is expensive, but it is useful to know. The simplest thing one can do with a vector is change its length, by multiplying by a constant: c*(Px,Py,Pz) = (c*Px, c*Py, c*Pz). Multiplying by a constant multiplies all elements in the vector by that constant. Just like with triangles all lengths increase proportionally, so it is a vector which points in the same direction but has a different length. Two vector operations that are very useful are the "dot product" and the "cross product". I'll write the dot product of two vectors R and P as either R.P or , and it is defined as R . P = |R| |P| cos(theta) that is, the length of R times the length of P times the cosine of the angle between the two vectors. From this it is easy to show that the dot product may also be written as R . P = Rx*Px + Ry*Py + ... that is, multiply the individual components together and add them up. Note that the dot product gives a _number_ (a scalar), NOT a vector. Note also that the dot product between two vectors separated by an angle greater than pi/2 will be negative, from the first equation, and that the dot product of two perpendicular vectors is zero. The dot product is sometimes referred to as the inner (or scalar) product. The cross-product (sometimes called the vector product or skew product) is denoted by RxP and is given by R x P = (Ry*Pz-Rz*Py, Rz*Px-Rx*Pz, Rx*Py-Ry*Px) As you can see, the result is a _vector_. In fact, this vector is perpendicular to both R and P -- it is perpendicular to the plane that R and P lie in. Its length is given by length = |R| |P| sin(theta) Note that P x R = - R x P; the direction of the resulting vector is usually determined by the "right hand rule". All that is important here is to remember that the cross-product generates perpendicular vectors. We won't be using any cross products in this article. There are lots of other things we can do to vectors. One of the most important is to multiply by a _matrix_. A matrix is like a bunch of vectors grouped together, so it has rows and columns. An example of a 2x2 matrix is [a b] [c d] an example of a 3x3 matrix is [a b c] [d e f] [g h i] and so on. The number of rows doesn't have to equal the number of columns. In fact, an n-dimensional vector is just an n x 1 (read "n by 1") matrix: n rows, but one column. We add matrices together by adding the individual elements: [a1 a2 a3] [b1 b2 b3] [a1+b1 a2+b2 a3+b3] [a4 a5 a6] + [b4 b5 b6] = [a4+b4 a5+b5 a6+b6] [a7 a8 a9] [b7 b8 b9] [a7+b7 a8+b8 a9+b9] We can multiply by a constant, which just multiplies all elements by that constant (just like the vector). Matrices can also be multiplied. The usual rule is "row times column". That is, given two matrices A and B, you take rows of A and dot them with columns of B: [A1 A2] [B1 B2] = [A1*B1+A2*B3 A1*B2+A2*B4] [A3 A4] [B3 B4] [A3*B1+A4*B3 A3*B2+A4*B4] In the above, the (1,1) element is the first row of A (A1 A2) times the first column of B (B1 B3) to get A1*B1+A2*B3. And so on. (With a little practice this becomes very easy). We will be multiplying rotation matrices together, and multiplying matrices times vectors. Although "row times column" is the usual way that this is taught, it can also be looked at as "columns times elements". The easiest example is to multiply a matrix A times a vector x. The first method gives: [a1 a2 a3] [ ] [a1*x1 + a2*x2 + a3*x3] let A = [a4 a5 a6] then Ax = [ ] = [a4*x1 + a5*x2 + a6*x3] [a7 a8 a9] [ ] [a7*x1 + a8*x2 + a9*x3] The right-hand part may be written as [a1] [a2] [a3] x1*[a4] + x2*[a5] + x3*[a6] [a7] [a8] [a9] or, in other words, Ax = x1*column1 + x2*column2 + x3*column3 that is, the components of x times the columns of A, added together. This is a very useful thing to be aware of, as we shall see. Note that normal multiplication commutes: 3*2 = 2*3. In matrix multiplication, this is NOT true. Multiplication in general does NOT commute, and AB is usually different from BA. We can also divide by matrices, but it isn't called division. It's called inversion. Let's say you have an equation like 5*x = b. To solve for x you would just multiply both sides by 1/5 i.e. by the "inverse" of 5. To solve a matrix equation like Ax = b we just multiply both sides by the inverse of A -- call it A'. And in just the same way that 1/a * a is one, a matrix times its inverse is the _identity matrix_ which is the matrix with "1" down the diagonal: [1 0 0] I = [0 1 0] [0 0 1] It is called the identity because IA = A; multiplying by the identity matrix is just like multiplying by one. Inverting a matrix is in general a very expensive operation, and we don't need to go into it here. We will be doing some special inversions later on though, so keep in mind that an inversion un-does a matrix multiplication. Transformations -- more than meets the eye! --------------- Now we have an _extremely_ powerful tool at our disposal. What happens when you multiply a matrix times a vector? You get a new vector, of the same dimension as the old one. That is, it takes the old vector and _transforms_ it to a new one. Take the two-dimensional case: [a b] [x] = [a*x + b*y] [c d] [y] [c*x + d*y] Look familiar? Well if a=cos(s), b=-sin(s), c=cos(s), and d=sin(s) we get the earlier rotation equations, which we can now rewrite as P' = RP where R is the rotation matrix R = [cos(s) -sin(s)] [sin(s) cos(s)]. The way to think about this is that R _operates_ on a vector P. When we apply R to P, it rotates P to a new vector P'. So far we haven't gained much. But in three dimensions, the rotation matrices look like [cos(s) -sin(s) 0] [cos(s) 0 sin(s)] Rz = [sin(s) cos(s) 0] Ry = [ 0 1 0 ] etc. [ 0 0 1] [-sin(s) 0 cos(s)] Try multiplying Rz times a vector (x,y,z) to see that it rotates x and y while leaving z unchanged -- it rotates about the z-axis. Now let's say that we're navigating through some 3D world and have turned, and looked up, and turned a few more times, etc. That is, we apply a bunch of rotations, all out of order: Rx Ry Ry Rz Rx Rz Ry Ry Rx P = P' All of the rotation matrices on the left hand side can be multiplied together into a _single_ matrix R R = Rx Ry Ry Rz Rx Rz Ry Ry Rx which is a _new_ transformation, which is the transformation you would get by applying all the little rotations in that specific order. This new matrix can now be applied to any number of vectors. And this is extremely useful and important. Another incredibly useful thing to realize is that the inverse of a rotation matrix is just its transpose (reflect through the diagonal, i.e. element i,j swaps with element j,i). It's very easy to see for the individual rotation matrices -- the inverse of rotating by an amount s is to rotate by an amount -s, which flips the minus sign on the sin(s) terms above. And if you take the transpose of the accumulated matrix R above, remembering that (AB)^T = B^T A^T, you'll see that R^T just applies all of the inverse rotations in the opposite order -- it undoes each small rotation one at a time (multiply R^T times R to see that you end up with the identity matrix). The important point, though, is that to invert any series of rotations, no matter how complicated, all we have to do is take a transpose. Hidden Faces (orientation) ------------ Finally there is the issue of hidden faces, and the related issue of polygon clipping. In previous articles a number of different methods have been tried. For hidden faces, the issue is to determine whether a polygon faces towards us (in which case it is visible) or away from us (in which case it isn't). One way is to compute a normal vector (for example, to rotate a normal vector along with the rest of the face). A light ray which bounces off of any point on this face will be visible -- will go through the origin -- only if the face faces towards us. The normal vector tells us which direction face is pointing in. If we draw a line from our eye (the origin) to a point on the face, the normal vector will make an angle of 90 degrees with the face when the face is edge-on. So by computing the angle between the normal vector and a vector from the origin to a point on the face we can test for visibility by checking if it is greater than or less than 90 degrees. We have a way of computing the angle between two vectors: the dot-product. Any point on the face will give a vector from the origin to the face, so choose some vertex V on the polygon, then dot it with the normal vector N: = Vx*Nx + Vy*Ny + Vz*Nz = |V| |N| cos(theta) Since cos(theta) is positive for theta<90 and negative for theta>90 all that needs to be done is to compute Vx*Nx + ... and check whether it is positive or negative. If you know additional information about either V or N (as earlier articles did) then this calculation can be simplified by quite a lot (as earlier articles did!). And if you want to be really cheap, you can just use the z-coordinate of the normal vector and skip the dot-product altogether (this only works for objects which are reasonably far away, though). Another method involves the cross-product -- take the cross-product of two vectors in the _projected_ polygon and see whether it points into or out of the monitor. Again, only the direction is of interest, so that usually means that only the z-component of the vector needs to be computed. On the downside, I seem to recall that in practice this method was cumbersome and tended to fail for certain polygons, making them never visible (because of doing integer calculations and losing remainders). The final method is a simple ordering argument: list the points of the polygon in a particular order, say clockwise. If, after rotating, that same point list goes around the polygon in a counter-clockwise order then the polygon is turned around the other way. This is the method that the 3d library uses. It is more or less a freebie -- it falls out automatically from setting up the polygon draw, so it takes no extra computation time for visible polygons. It is best-suited to solid polygon routines, though. Polygon clipping refers to determining when one polygon overlaps another. For convex objects (all internal angles are less than 180 degrees) this isn't an issue, but for concave objects it's a big deal. There are two convex objects in Cool World: the pointy stars and the Cobra Mk III. The pointy stars are clipped correctly, so that tines on the stars are drawn behind each other properly. The Cobra doesn't have any clipping, so you can see glitches when it draws one polyon on top of another when it should be behind it. A general clipping algorithm is pretty tough and time-consuming. It is almost always best to split a concave object up into a group of smaller convex objects -- a little work on your part can save huge amounts of time on the computer's part. And THAT, I think, covers the fundamentals of 3d graphics. Now it's on to constructing a 3D world, and implementing all that we've deduced and computed as an actual computer program. --------- Section 2: Constructing a 3D world --------- Now that we have the tools for making 3D objects, we need to put them all together to make a 3D world. This world might have many different objects in it, all independent and movable. They might see each other, run into each other, and so on. And they of course will have to be displayed on the computer. As we shall see, we can do all this in a very elegant and simple manner. Representing objects -------------------- The object problem boils down to a few object attributes: each object has a _position_ in the world, and each object has an _orientation_. Each object might also have a _velocity_, but velocity is very easy to understand and deal with once the position and orientation problem has been solved. The position tells where an object is -- at least, where the center of the object is. The orientation vector tells in what direction the object is pointing. Velocity tells what direction the object is moving. Different programs will handle velocity in different ways. Although we are supposed to be tolerant of different positions and orientations, in these programs they are all handled the same way and are what will be focused on. If you can't visualize this, just think of some kind of spaceship or fighter plane. It is located somewhere, and it points in a certain direction. The orientation vector is a little line with an arrow at the end -- it points in a certain direction and is anchored at some position. As the object turns, so does the orientation vector. And as it moves, so does the position. Once positions and orientations are known we can calculate where everything is relative to everything else. "Relative" is an important word here. If we want to know how far we are from some object, our actual locations don't matter; just the relative locations. And if we are walking around and suddenly turn, then everything better turn around us and not some other point. Rotations are always about the origin, which means we are the origin -- the center of the universe. Recall also that the way projections work is by assuming we are looking straight down the z-axis, so our orientation better be straight. So to find out what the world looks like from some object we need to do two things. First, all other objects need to be translated to put the specific object at the origin -- shift the coordinate system to put the object at (0,0,0). Second, the orientation vector of the object needs to be on the z-axis for projections to work right -- rotate the coordinate system and all of the objects around us to make the orientation vector be in the right direction. Every time an object moves, its position changes. And every time it turns or tilts it changes its orientation. Let's say that after a while we are located at position P and have an orientation vector V, and want to look at an object located at position Q. Translating to to the origin is easy enough: just subtract P from all coordinates. So the original object will be located at P-P = 0, and the other object will be located at Q-P. What about rotating? The orientation vector comes about by a series of rotations applied to the initial orientation. As was explained earlier, all of those rotations can be summed up as a single rotation matrix R, so if the intial vector was, say, z=(0,0,1) then V = Rz Given a position vector, what we need to do is un-rotate that position vector back onto the z-axis. Which means we just multiply by the _inverse_ of R, R'. For example, if we turn to the left, then the world turns to the right. If we turn left and then look up, the way to un-do that is to look back down and turn right. That's an inverse rotation, and we know from the earlier sections that rotation matrices are very easy to invert. So, after translating and rotating, we are located at the origin and looking straight down the z-axis, whereas the other object Q is located at the rotated translated point Q2 = R' (Q-P) i.e. Q-P translates the object, and R' rotates it relative to us. Readers who are on the ball may have noticed that so far we have only deduced what happens to the _center_ of the object. Quite right. We need to figure out what happens to the points of an object. First it should be clear that we want to define the points of an object relative to the center of the object. It is first necessary so that the object can rotate on its own. And it has all kinds of computational advantages, as we shall see in a later section; among them are that the numbers stay small (and the rotations fast), they are always rotated as a whole (so they don't lose accuracy or get out of proportion), and the points of objects which aren't visible don't have to be rotated at all. So we need to amend the previous statement: we need to figure out what happens to the points of an object _only when that object is visible_. Visibility ---------- When is an object visible? When it is within your field of vision of course. It has to be in front of you, and not too far out to the sides. So, after translating and rotating an object center, Q2 = R' (Q-P), we need to check if Q2 is in front of us (if the z-coordinate is positive, and that it isn't too far away), and that it isn't too far out to either side. The field of vision might be a cone; the easiest one is a little pyramid made up of planes at 45 degrees to one another. That is, that you can see 45 degrees to either side or up and down. This is the easiest because the equation of a line (a plane actually) at 45 degrees is either x=z or x=-z, y=z or y=-z So it is very easy to check the x- and y-coordinates of the new center to see if -z < x < z and -z < y < z. If the object is visible, then the rest of the object needs to be computed and displayed. Computing displayed objects --------------------------- Now let's say this object, which was located at Q, is visible. It has a bunch of points that define the object, relative to the center -- call those points X1 X2 X3 etc. where X1=(x1,y1,z1) etc. -- and it has an orientation W. Again, the orientation vector is computed by performing some series of rotations M on a vector which lies along the z-axis like (0,0,1). First the points need to be rotated along with the orientation vector. This isn't an inverse rotation like before -- the points rotate in the same way as the orientation vector. So apply M to all the points; consider a single point X1: rotated point = M X1 Once the points have been rotated we have to find their actual location. Since they are defined relative to the center of the object, this means just adding them to the center of the object: rotated point + Q = M X1 + Q This is the _actual_ location of the points in the world. Now as before we need to translate and rotate them to get their relative location: X1' = R' (M X1 + Q - P) As before, subtract P and apply the inverse rotation R'. The above equation is the _entire_ equation of the 3D world! We can rewrite this equation as X1' = R' M X1 + R'(Q-P) and recognize that R'(Q-P) was calculated earlier, to see if the object was visible. The above equation can be read as "Rotate the rotated object points backwards to the way we are facing, and add to the rotated and translated center." That is, if we turn one way, the center rotates in the opposite direction, and the entire object rotates in the opposite direction. Physically, if you turn your head sideways you still see the monitor facing you. If instead of turning your head you were to move the monitor, you would also have to rotate the monitor to keep it facing you (if you didn't rotate the monitor, and only changed its position, you would be able to see the sides, etc.). That rotation is in the opposite direction that your head would rotate (head turns to the right, monitor turns to the left). Displaying objects ------------------ Now that everyone is rotated and translated and otherwise happy, they need to be projected and displayed. The projection is easy -- as before, divide by the z-coordinate and multiply by a magnification factor. Then connect the points in the right way to get the polygons and faces that make up the objects, using the appropriate method to remove faces that aren't visible, and draw to the screen. One thing remains, though: depth-sorting the objects, to make sure that close-up objects overlap far-away objects. So, _before projecting_, all of the z-coordinates need to be looked at and the objects ordered so that they are drawn in the right way. This is really a form of polygon clipping. You can do the same thing with various complex objects to get an easy way to clip. Summary ------- All objects have a position and an orientation, and might have things like velocity. The position of the object is where the object is located. The points that make up any object are defined relative to this center; the center is thus the center of rotation of the object in addition to being its location. The orientation determines what direction the object is pointing in, and the velocity determines what direction it is moving in. Navigating through the world amounts to changing the position and orientation (and velocity) of the object. To figure out how the world looks from any single object with position P, P=(Px,Py,Pz), and orientation matrix R (where R=cumulative rotation matrix), a very simple and elegant equation is used. First the equation R'(Q-P) where R' = inverse of R, determines where the center of an object Q is located (by translating and then undoing the orientation of P), and tells whether the object at Q is visible or not. If it is, then the equation R' (M X + Q-P) gives the new location of any point X on the object, where M is the orientation matrix for Q, and X is some point defined relative to Q. After depth-sorting the individual objects, these points are projected and drawn to the screen. Doing things in this way gives a method that is very fast and efficient, and always retains accuracy -- everything is calculated relative to everything else. This is extremely powerful. It makes the programming of objects easy, it makes no roundoff errors from e.g. rotating rotated points, and no cycles are wasted calculating rotations of non-visible objects. Some PC algorithms you might come across do all sorts of God-awful things to overcome these problems, like rotate a single point and then use cross-products to preserve all the angles, and all sorts of other horrid mathematical butchery. That way leads to the Dark Side. By the way, there are some subtleties (for example, computing orientation vectors) in this calculation -- see the discussion of Cool World in Section 4 for more detail on implementation issues. --------- Section 3: Implementing the 3D Library --------- Now that we've dined on a tasty and nutritous meal of some theory we need to figure out how to implement everything on the 64, and to do so efficiently! The idea of the 3D library is to collect all of the important and difficult routines into a single place. Before doing so it is probably a good idea to figure out just what the important routines are. Obviously rotations and projections are important. In fact, there just isn't much else to do -- a few additions here and there, maybe a few manipulations, but everything else is really straightforward. Of course, once that data is rotated and projected it would probably be much more enjoyable for the viewer to display it on the screen. Drawing lines is pretty easy, but a good solid-polygon routine is pretty tough so it makes a good addition to the library. So three things, and the first is rotations. There are two kinds of rotations. When we turn left or right we rotate the world, which means rotating the positions of objects. We also need to rotate the object itself, which means rotating the individual points that define the object. Calculating rotation matrices ----------------------------- In both cases, we need a rotation matrix that we can use to transform coordinates. Previous programs just calculated a rotation matrix like R = Ry Rz Rx so that for some point P: RP = Ry Rz Rx P that is, given three angles sx, sy, and sz, calculate the rotation matrix you get by first rotating around the x-axis by an amount sx (Rx times P), then the z-axis by an amount sz (Rz times Rx P), and finally the y-axis by the amount sy (Ry times (Rz times Rx P)). Well, that kind of routine just doesn't cut it anymore. Why? Well, what happens when you turn left, turn left, look up, and roll? You get a matrix that looks like R = Rz Rx Ry Ry (think Ry=turn left, Rx=look up, Rz=roll) What three angles sx sy and sz would give us that exact rotation matrix? -We don't know-! As has been said many times, **matrix multiplications do not commute**, so rotations do not commute, so we can't just add up the rotation amounts or something. We have to keep a running rotation matrix, that just accumulates rotations each time we turn left or roll or whatever. In terms of a routine, that means we need to be able to compute Rx R and Ry R and Rz R where R is the accumulated rotation matrix, and Rx etc. are the little rotation matrices about the x/y/z axis that correspond to us turning left and right such. Although this seems like a lot of work, as we shall see it is actually quite a bit faster than calculating an entire rotation matrix like the old routine. Still, it is sometimes handy to be able to calculate a rotation matrix using the old method, like when rotating by large amounts, or if we just need an object to spin around in some arbitrary way, or if all we need is a direction and an elevation (think of a gun turret or a telescope). So an improved version of the old routine is also included. Rotating points --------------- That ought to take care of calculating the rotation matrix. Now we need to apply that matrix to some points. The first type of points are the object centers (i.e. positions in the world). Polygonamy used a single byte to represent positions, and that was really too restrictive. A full 16-bit number is needed, and it should be signed. So we need to be able to rotate 16-bit signed numbers. The second type of points are the object points, which are defined relative to the object centers. Since they are relative to the centers these points can be 8-bits, but they should definitely be signed integers. There are many more object points than there are objects, so these rotations need to be really fast. In both cases, it clearly makes much more sense to pass in a whole list of points to be rotated, instead of having to call the routine for each individual point. Remember that the object points are only rotated if the object is actually visible. And, in point of fact, once they've been rotated they will need to be projected. So it makes sense to tack the projection routine onto the object-rotation routine (as opposed to the world-rotation routine, which rotates the object centers). To summarize, then, in addition to the matrix calculation routines we need two rotation routines: one for rotating 16-bit signed object centers, and one for rotating (and possibly projecting) the actual object points. Drawing Polygons ---------------- Once everything has been rotated and projected, it will need to be drawn. So a good solid polygon routine makes a great addition to the library. The algorithm will be built upon the polygonamy idea: start at the bottom of the polygon, and draw the left and right sides of the polygon _simultaneously_, filling in-between. That is, move up in the y-direction, calculate the left-endpoint, calculate the right-endpoint, and fill. To calculate the endpoints we just calculate the slopes of the lines, and draw. So all that is needed is to pass in a list of points, and perhaps tell the routine where to draw those points. Recall also that this gives a very easy way of doing hidden faces: if the points are given in counter-clockwise order for the normal polygon, then they will be in clockwise order if the polygon is facing away from us. So basically the polygon isn't visible if we can't draw it, and the hidden-face calculation is more or less a freebie. (That is, the hidden-face calculation takes no extra time if the face is visible; it will use up some time though if the face is not visible.) The polygonamy routine had two large disadvantages: it was huge, because of the way the fill-routines and tables were set up, and it was inflexible in the sense that it could only draw to a specific bitmap. It also couldn't handle things like negative coordinates, or drawing polygons which were partially off-screen. The code was also a bit convoluted, and in need of re-thinking (imagine a big prototype machine with exposed wires sticking out and large cables snaking around, all held together with duct tape and bailing wire, and you'll get an idea of what the polygonamy code looks like). Of course, since we already know the ideas and equations, it's not too tough to just rework everything from scratch. Basic Routines -------------- Clearly, before writing all of the library routines we are going to have to figure out how to do some very general operations. We are going to need some 8- and 16-bit signed multiply routines. The projection routine is going to involve a divide of a 16-bit z-coordinate, and the polygon routine is going to need a division routine for calculating line slopes. Then we have to figure out how to represent numbers and on and on. Hokey smokes! This is all new stuff, too. But, first and foremost, we need a multiply. We will of course be using the fast multiply, which uses the fact that a*b = f(a+b) - f(a-b), f(x) = x^2/4. Understanding this routine will be crucial to understanding what will follow. The general routine to multiply .Y and .A looks like STA ZP1 ;ZP1 points to f(x) low byte STA ZP2 ;high byte EOR #$FF ADC #01 STA ZP3 ;table of f(x-256) -- see below STA ZP4 LDA (ZP1),Y ;f(Y+A), low byte SEC SBC (ZP3),Y ;f(Y-A), low byte TAX LDA (ZP2),Y ;f(Y+A), high byte SBC (ZP4),Y ;f(Y-A), high byte ;result is now in .X .A = low, high By using the indexed addressing mode, we let the CPU add A and Y together. If Y and A can both range from 0..255, then Y+A can range from 0..510. So we need a 510-byte table of f(x). What about Y-A? The complementing of A means that what we are really calculating on the computer is Y + 256-A, or 256+Y-A. So we need another table of f(x-256) to correctly get f(Y-A). Consider Y=1, A=2. -A = $FE in 2's complement, so added together we get $FF, or -1. Next consider Y=2, A=1. In this case, Y-A gives 2+$FF = $101, or 257. Computationally, if Y-A is 0..255 it is a negative number, and subtracting 256 gives the correct negative number. If Y-A is 256..510, then it really represents 0..255, so again subtracting 256 gives the correct number. If you think about it, you'll see that the upper 256 bytes of this table is the same as the lower 256 bytes of the first table: f(x-256) for x=256..511 is the same as f(x) for x=0..255. This means that the first table can be piggy-backed on top of the second table in memory. For example, if the table were at $C000 $C000 f(x-256) $C100 f(x) (512 bytes) would do the trick. WITH ONE EXCEPTION. And a rather silly one at that. That exception is A=0. Multiplying by zero should give a zero. But the complementing of A gives -256, which on the computer is... zero! That is, think of A=0 and Y=0. A-Y=0, and when we look it up in the table of f(x-256) we will get f(-256) instead of f(0), and get the wrong answer. So we have to check for A=0. And oh yes: don't forget to round the tables correctly! Extremely Cool Stuff -------------------- As usual, a little bit of thinking and a little bit of planning ahead will lead to enormous benefits. After staring at the fast multiplication code for a bit, we can make a very useful observation: Once the zero page locations are set up, they stay set up. This makes all the difference in the world! Let's say we've just calculated a*b. If we now want to calculate c*b, _we don't have to set up b again_. If we have a succession of calculations, x*a1, x*a2, x*a3, etc., we only have to set up the ZP location once (with x and -x) and keep reusing it. This makes the fast multiply become very fast! With a lot of multiplications it reduces to just 24 cycles or so for a full 16-bit result, as all of that zero-page overhead disappears. Now think about rotations. We have to multiply a matrix times a vector (i.e. the three coordinates). It was pointed out in the discussion of linear algebra that we can write the multiplication as [x1] R [x2] = x1*column1 of R + x2*column2 of R + x3*column3 of R [x3] And there is the payoff: we get three multiplies for every one set up of zero page. That is, we set x1 in the zero page variables, and then multiply by the three elements of the first column. Then do the same for x2 and x3. So we still do nine multiplications, but we only have to set up the zero-page locations three times. Just changing the way in which we do these multiplications has instantly given us a great big time savings! Now we need to figure out how to do signed multiplies. The fast multiply runs into problems adjusting to signed numbers. Consider the simple example of x=-100 and y=1. We get that f(x+y) = f(256-100 + 1) = f(157). But this is just what we'd get if we multiplied x=100 and y=57 together. You don't know whether 157 is -99 or if it really is 157. So we can't just modify the tables... unless... well, what about if we are able to remove this ambiguity from the tables? We can do just that if we restrict the range of allowed numbers. For example, if we restrict to x=-96..95 = 160..255,0..95 (-96 is 160 in 2's complement) y=-64..64 = 192..255,0..64 (-64 is 192, -63 is 193, etc.) then if x+y = 0..159 the actual number is 0..159 160..255 -96..-1 256..351 0..95 352..510 -160..-2 That is, the _only_ way you can get the number 159 from x+y is when x=95 and y=64. The four ranges above are derived by considering the four cases x>0 and y>0, x<0 and y>0, x<0 and y<0, x>0 and y<0. All values of x+y and x-y, from 0 to 510, correspond to _unique_ values of x and y, if we restrict x to the range -96..95 and restrict y to the range -64..64. With those restrictions, we can construct a *single* 512-byte fast-multiply table to do a signed fast-multiply. How is this useful? One word: object rotations. If the rotation matrix values range from -64..64, and the object points range from -96..95, then we can do an _extremely_ fast multiply to perform the object rotations, as we shall see in the discussion of the rotation routines. Since there are quite a lot of object points to rotate, this makes an enormous contribution to the overall speed of the routine. Although the above helps out greatly for the point rotations, we still need a general signed multiply routine for the centers and the projections and such. So, let's figure it out! Say we multiply two numbers x and y together, and x is negative. If we plug it in to a multiplication routine (_any_ multiplication routine), we will really be calculating (2^N + x)*y = 2^N*y + x*y assuming that x is represented using 2's complement (N would be 8 or 16 or whatever). There are two observations: - If the result is _less_ than 2^N, we are done -- 2^N*y is all in the higher bytes which we don't care about. - Otherwise, subtract 2^N*y from the result, i.e. subtract y from the high bytes. Now let's say that both x and y are negative. Then on the computer the number we will get is (2^N + x)*(2^N + y) = 2^(2N) + 2^N*x + 2^N*y - x*y Now it is too large by a factor of 2^2N, 2^N*x and 2^N*y. BUT the basic observations haven't changed a bit! We still need to _subtract_ x and y from the high bytes. And the 2^2N is totally irrelevant -- we can't get numbers that large by multiplying numbers together which are no larger than 2^N. This leads to the following algorithm for doing signed multiplications: multiply x and y as normal with some routine if x<0 then subtract y from the high bytes of the result if y<0 then subtract x from the high bytes And that's all there is to it! Note that x and y are "backwards", i.e. subtract y, and not x, when x<0. Some examples: x=-1, y=16 Computer: x=$FF y=$10 (N=8) x*y = $0FF0 Result is less than 256, so ignore high byte Answer = $F0 = -16 OR: subtract y from high byte, Answer = $FFF0 = -16 x=2112 y=-365 Computer: x=$0840 y=$FE93 (N=16) x*y = $08343CC0 y<0 so subtract x from high bytes (x*2^16), Answer = $F43CC0 = -770880 x=-31 y=-41 Computer: x=$E1 y=$D7 x*y = $BCF7 x<0 so subtract $D700 -> $E5F7 y<0 so subtract $E100 -> $04F7 = 1271 = correct! So, in summary, signed multiplies can be done with the same fast multiply routine along with some _very simple_ post-processing. And if we know something about the result ahead of time (like if it's less than 256 or whatever) then it takes _no_ additional processing! How cool is that? Projections ----------- After rotating, and adding object points to the rotated centers, the points need to be projected. Polygonamy had a really crappy projection algorithm, which just didn't give very good results. I don't remember how it worked, but I do remember that it sacrificed accuracy for speed, and as a result polygons on the screen looked like they were made of rubber as their points shifted around. So we need to re-solve this problem. Recall that projection amounts to dividing by the z-coordinate and multiplying by a magnification factor. Since the object points will all be 16-bits (after adding to the 16-bit centers), this means dividing a 16-bit signed number by another 16-bit number, and then doing a multiplication. Bleah. The first thing to do is to get rid of the divide. The projected coordinates are given by x' = d*x/z = x * d/z, y' = d*y/z = y * d/z where d is the magnification factor and z is the z-coordinate of the object. The obvious way to avoid the divide is to convert it into a multiply by somehow calculating d/z. The problem is that z is a 16-bit number; if it were 8 bits we could just do a table lookup. If only it were 8 bits... Hmmmmm... Physically the z-coordinate represents how far away the object is. For projections, we need - accuracy for small values of z (when objects are close to us and large) - speed for large values of z If z is large then accuracy isn't such a big deal, since the objects are far away and indistinct. So if z is larger than 8 bits, we can just shift it right until it is eight bits and ignore the tiny loss of accuracy. Moreover, note that the equations compute x/z and y/z; that is, a 16-bit number on top of another. Therefore if we shift _both_ x and z right we don't change the value at all! So now we have a nifty algorithm: if z>255 then shift z, x, and y right until z is eight bits compute d/z multiply by x, multiply by y There are still two more steps in this algorithm, the first of which is to compute d/z. This number can be less than one and as large as d, and will almost always have a remainder. So we are going to need not only the integer part but the fractional part as well. The second step is to multiply this number times x. Let d/z = N + R/256, so N=integer part and R=fractional part. For example, 3/2 = 1 + 128/256 = 1.5. Also let x = 256*xh + xl. Then the multiplication is x * d/z = (256*xh + xl) * (N + R/256) = 256*xh*N + xh*R + xl*N + xl*R/256 There are four terms in this expression. We know ahead of time that the result is going to be less than 16-bits (remember that the screen is only 320x200; if we are projecting stuff and getting coordinates like 40,000 then there is a serious problem! And nothing in the library is set up to handle 24-bit coordinates anyways). The first term, xh*N, must therefore be an eight-bit number, since if it were 16-bits multiplying by 256 would give a 24-bit number, which we've just said won't happen. So we only really care about the lower eight bits of xh*N. The next two terms, xh*R and xl*N, will be 16-bit numbers. The last term, xl*R. will also be 16-bits, but since we divide by 256 we only care about the high 8-bits of the result (we don't need any fractional parts for anything). And that, friends, takes care of projection: accurate when it needs to be, and still very fast. Cumulative Rotations -------------------- There is still this problem with the accumulating rotation matrix. The problem to be solved is: we want to accumulate some general rotation operator [A B C] M = [D E F] [G H I] by applying a rotation matrix like [ 1 0 0 ] Rx = [ 0 cos(t) sin(t)] [ 0 -sin(t) cos(t)] to it. So, just do it: [ A B C ] Rx M = [ D*cos(t)+G*sin(t) E*cos(t)+H*sin(t) F*cos(t)+I*sin(t) ] [-D*sin(t)+G*cos(t) -E*sin(t)+H*cos(t) -F*sin(t)+I*cos(t) ] You might want to do the matrix multiplication yourself, to double-check. Notice that it only affects rows 2 and 3. Also notice that only one column is affected at a time: that is, the first column of Rx M contains only D and G (the first column of M); the second column only E and H; etc. Similar expressions result when multiplying by Ry or Rz -- they just affect different rows. The point is that we don't need a whole bunch of routines -- we just need one routine, and to tell it which rows to operate on. In general, the full multiplication is a pretty hairy problem. But if the angle t is some fixed amount then it is quite simple. For example, if t is some small angle like 3 degrees then we can turn left and right in 3 degree increments, and the quantities cos(t) and sin(t) are just constants. Notice also that rotating in the opposite direction corresponds to letting t go to -t (i.e. rotating by -angle). Since sine is an odd function, this just flips the sign of the sines (i.e. sin(-t) = -sin(t)). If sin(t) and cos(t) are constants, then all that is needed is a table of g(x) = x*sin(t) and x*cos(t); the above calculation then reduces to just a few table lookups and additions/subtractions. There's just one caveat: if t is a small number (like three degrees) then cos(t) is very close to 1 and sin(t) is very close to 0. That is to say, the fractional parts of x*sin(t) and x*cos(t) are very important! So we will need two tables, one containing the integer part of x*cos(t) and the other containing the fractional. This in turn means that we need to keep track of the fractions in the accumulation matrix. The accumulation matrix will therefore have eighteen elements; nine integer parts and nine fractional parts. (The fractional part is just a decimal number times 256). The routines to rotate and project only use the integer parts; the only routines that really need the fractional parts are the accumulation routines. How important is the fractional part? Very important. There is a class of transformations called area-preserving transformations, of which rotations are a member. That is, if you rotate an object, the area of that object does not change. The condition for a matrix to be area-preserving is that its determinant is equal to one; if the cumulative rotation matrix starts to get inaccurate then its determinant will gradually drift away from one, and it will no longer preserve areas. This in turn means that it will start to distort an object when it is applied to the object. So, accuracy is very important where rotation matrices are concerned! There is one other issue that needs to be taken care of. Rotation matrices have this little feature that the largest value that any element of the matrix can ever take is: one! And usually the elements are all less than one. To retain accuracy we need to multiply the entire matrix by a constant; a natural value is 64, so that instead of ranging from -1..1 the rotation matrix elements will range from -64..64. And as was pointed out earlier, we can do some very fast signed arithmetic if one of the numbers is between -64 and 64. The downside is that we have to remember to divide out that factor of 64 once the calculation is done -- it's just a temporary place-holder. For object point rotations we can incorporate this into the multiplication table, but for the 16-bit center rotations we will have to divide it out manually. Polygon Routine Calculations ---------------------------- Finally, there's the polygon routine. Starting at the lowest point on the polygon, we need to draw the left and right sides simultaneously. Specifically we want to draw the lines to the left and right, and we only care about the x-coordinates of the endpoints at each value of y: start at the bottom Decrement y (move upwards one step) Compute x-coord of left line Compute y-coord of right line (Fill in-between) Drawing a line is easy. The equation of a line is dy = m*dx, m=slope, dy=change in y, dx=change in x The question posed by the above algorithm is "If I take a single step in y, how far do I step in x?" Mathematically: if dy=1, what is dx? Obviously, dx = 1/m i.e. the inverse slope. If the endpoints of the line are at (X1,Y1) and (X2,Y2), then the inverse slope is just 1/m = DX/DY, where DX=X2-X1 and DY=Y2-Y1. Once this is known for the left and right sides, updating the x-coordinates is a simple matter of just calculating x+dx (dx=DX/DY). Note that if DY=0, the line segment is a straight horizontal line. Therefore we can just skip to the next point in the polygon, and let the fill routine -- which is very good at drawing straight lines -- take care of it. Now we can start drawing the left and right sides. As we all know, lines on a computer are really a series of horizontal line segments: ***** ** ***** * (lines with positive and **** ** negative slope) ***** * Note that the polygon routine doesn't calculate all points on the line, the way a normal line routine does. It only calculates the left or right endpoints of each of those horizontal line segments. For example, if dx=5 then the routine takes big steps of 5 units each; the fill routine takes care of the rest of the line segment. There are two issues that need to be dealt with in the polygon routine. The first trick in drawing lines is to split one of those line segments in half, for the first and last points. That is, consider a line with DY=1 and DX=10. From the above equations, dx = 10, and if this line was just drawn naively it would look like * ********** i.e. a single horizontal line segment of length 10, followed by a dot at the last point. What we really want is to divide that one line segment in half, to get ***** ***** for a nice looking line. In the polygon routine, this means that the first step should be of size dx/2. All subsequent steps will be of size dx, except for the last point. The other issue to be dealt with again deals with the endpoints. How to deal with them depends on two things: the slope of the line, and whether it is the left or right side of the polygon. Consider the below line, with endpoints marked with F and L (for first and last) *L** **** F* and dx=4. You can see that we have overshot the last point. In point of fact, we may also have overshot the first point. Let's say the above was the left side of a polygon. Then we would want to start filling at the LEFT edge of each line segment -- we want that * to the left of L to be drawn, but want to start filling at F. On the other hand, if this were the right side of the polygon then we would want to fill to the right edge of each line segment. That means filling to point L, again grabbing the * to the left of L, and filling to the * to the right of F. The correct way to handle this is actually very simple: place the x-coordinate update (x=x+dx) in just the right spot, either before or after the plot/fill routine. In the above example, the left line should do the update after the plot/fill: fill in-between and plot endpoints x=x+dx The line will then always use the "previous" endpoint: it will start with F, and then with the last point of the previous line segment. The right line should do the updating before the plotting: it will then use the rightmost point of each segment, and when the end of the line is reached the next line will be computed, resetting the starting point. Polygonamy put some wacky restrictions on DX and DY that just won't cut it here. The lib3d routine can handle negative coordinates, and polygons which are partially off-screen. The only restriction it puts on the numbers is that DX is 9-bits and DY is 8-bits; the coordinates X2 and Y2 etc. can be a full 16-bits, but the sides of the polygons can't exceed DX=9-bits and DY=8-bits. In principle this allows you to draw polygons which are as large as the entire screen. In practice it was perhaps not a great decision: if these polygons are being rotated, then a polygon with DX=320 will have DY=320 after a 90-degree rotation. So in a sense polygons _sides_ are actually limited to be no larger than 256 units. The important word here though is "side". A polygon can be much larger than 256x256 if it has more than one side! For example, a stopsign has eight sides, but each side is smaller than the distance from one end of the polygon to the other. Also, sides can always be split in half. This restriction is just one of those design decisions... Anyways, DX=9-bits and DY=8-bits. To calculate this we need a divide routine. Polygonamy used a routine involving logs and some fudged tables and other complexities. Well most of that was pretty silly, because the divide is such a tiny portion of the whole polygon routine -- saving a few tens of cycles is awfully small potatoes in a routine that's using many thousands of cycles. So the new strategy is to do the general routine: calculate log(DX) and log(DY), subtract, and take EXP() of the result to get an initial guess. Then use a fast multiply to see how accurate the guess was, and if it is too large or too small then a few subtractions or additions can fix things up. The remainder is also computed as a decimal number (remainder/DY times 256); the remainder is crucial, as accuracy is important -- if the lines are not drawn accurately, the polygon sides won't join together, and the polygon will look very weird. Why not do a divide like in the projection routine? Well, first of all, I wrote the polygon renderer before deriving that routine. :) Second, that routine combines a multiply _and_ a divide into a single multiply; there is no multiply here, and if you count up the cycles I think you'll find that the logarithmic routine is faster, and with the remainder calculation it is simply better-suited for drawing polygons. Although the remaining parts of the algorithm require a lot of blood and sweat to get the cycle count down, they are relatively straightforward. I leave their explanation, along with a few other subtleties, for the detailed code disassemblies, below. Detailed Code Disassembly ------------------------- After all of those preliminary calculations and ideas and concepts we are finally in a position to write some decent code. The full lib3d source code is included in this issue, but it is worth examining the routines in more detail. Five routines will be discussed below: CALCMAT - Calculate a rotation matrix ACCROTX - Accumulate the rotation matrix by a fixed amount GLOBROT - Global rotate (rotate centers) ROTPROJ - Local rotation (object points), and project if necessary POLYFILL- Polygon routine Only the major portions of the routines will be highlighted. And the first routine is... CALCMAT ------- * * Calculate the local matrix * * Pass in: A,X,Y = angles around z,x,y axis * * Strategy: M = Ry Rx Rz where Rz=roll, Rx=pitch, Ry=yaw * CALCMAT calculates a rotation matrix using the old method: pass in the rotation amounts about the x y and z axis and calculate a matrix. There is a fairly complete explanation of this routine in the source code, but it has a very important feature: the order of the rotations. As the above comment indicates, it first takes care of roll (tilting your head sideways), then pitch (looking up and down), and finally yaw (looking left or right). This routine can't be used for a general 3D world, but by doing rotations in this order it _can_ be used for certain kinds of motion, like motion in a plane (driving a car around), or an angle/elevation motion (like a gun turret). Note also the major difference from the accumulation routines: they rotate by a fixed amount. Since this routine just uses the angles as parameters, the program that uses this routine can of course update the angles by any amount it likes. That is: the accumulation routines can only turn left or right in 3 degree increments; this routine can instantly point in any direction. Next up are the accumulation routines. These routines apply a fixed rotation to the rotation matrix; i.e. they rotate the world by a fixed amount about an axis. The carry flag is used to indicate whether rotations are positive or negative (clockwise or counter- clockwise if you like). As was pointed out earlier, only one routine is needed to do the actual rotation calculation. All it needs to know is which rows to operate on, so that is the job of ACCROTX ACCROTY and ACCROTZ. ACCROTX ------- * * The next three procedures rotate the accumulation * matrix (multiply by Rx, Ry, or Rz). * * Carry clear means rotate by positive amount, clear * means rotate by negative amount. * ACCROTX ENT The x-rotation just operates on rows 2 and 3; these routines just loop through the row elements, passing them to the main routine ROTXY. The new, rotated elements are then stored in the rotation matrix, and on exit the rotation matrix has accumulated a rotation about the x-axis. ROR ROTFLAG LDX #2 :LOOP STX COUNT LDA D21REM,X ;rows 2 and 3 STA AUXP LDA G31REM,X STA AUXP+1 LDY G31,X ;.Y = row 3 LDA D21,X ;.X = row 2 TAX JSR ROTXY LDX COUNT LDA TEMPX STA D21REM,X LDA TEMPX+1 STA D21,X LDA TEMPY STA G31REM,X LDA TEMPY+1 STA G31,X DEX BPL :LOOP RTS * * Rotate .X,AUXP .Y,AUXP+1 -> TEMPX,+1 TEMPY,+1 * * If flag is set for negative rotations, swap X and Y * and swap destinations (TEMPX TEMPY) * ROTXY This is the main accumulation routine. It simply calculates x*cos(delta) + y*sin(delta), and y*cos(delta) - x*sin(delta). Since delta is so small, cos(delta) is very nearly 1 and sin(delta) is very nearly zero: an integer and a remainder part of x and y are necessary to properly calculate x*cos(delta) etc. Let x = xint + xrem (integer and remainder). Then x*cos(delta) = xint*cos(delta) + xrem*cos(delta). Since cos(delta) is nearly equal to 1, xint*cos(delta) is equal to xint plus a small correction; that is, xint*cos(delta) gives an integer part and a remainder part. What about xrem*cos(delta)? It also gives a small correction to xrem. But xrem is already small! Which means the correction to xrem is really really small, i.e outside the range of our numbers. So we can just discard the correction. This can generate a little bit of numerical error, but it is miniscule, and tends to get lost in the noise. The point of all this is that x*cos(delta) + y*sin(delta) is calculated as xrem + xint*cos(delta) + yint*sin(delta) where xrem*cos(delta) is taken to be xrem, and yrem*sin(delta) is taken to be zero. This introduces a very small error into the computation, but the errors tend to get lost in the wash. :CONT LDA CDELREM,X CLC ADC SDELREM,Y STA TEMPX LDA CDEL,X ;x*cos(delta) ADC SDEL,Y ;+y*sin(delta) STA TEMPX+1 LDA TEMPX CLC ADC AUXP ;+xrem STA TEMPX BCC :NEXT INC TEMPX+1 :NEXT a similar piece of code to calculate y*cos - x*sin And that is the entire routine -- quite zippy, and at 14 bits per number it is remarkably accurate. GLOBROT ------- Next up is GLOBROT. The centers are 16-bit signed numbers, and the rotation matrix of course consists of 8-bit signed numbers. * * Perform a global rotation, i.e. rotate the centers * (16-bit signed value) by the rotation matrix. * * The multiplication multiplies to get a 24-bit result * and then divides the result by 64 (mult by 4). To * perform the signed multiplication: * - multiply C*y as normal * - if y<0 then subtract 256*C * - if C<0 then subtract 2^16*y * * Parameters: .Y = number of points to rotate * Recall that the object centers are 16-bit signed coordinates; therefore a full 16-bit signed multiply is needed. Two macros are used for the multiplies: MULTAY MAC ;Multiply A*Y, store in var1, A which does the usual fast multiply, and QMULTAY MAC ;Assumes pointers already set up LDA (MULTLO1),Y SEC SBC (MULTLO2),Y STA ]1 LDA (MULTHI1),Y SBC (MULTHI2),Y <<< which does a "quick multiply", that is, it is the fast multiply without any of the zero-page setup muckety-muck. As was pointed out in the theory section, once the zero page variables are set up, they stay set up! ROTPROJ can take better advantage of this, but GLOBROT can also benefit. Remember also that if A=0 this routine will fail; the routines below must check for A=0 ahead of time. Since the matrix elements are all "6-bit offset", that is, since they are the actual value times 64, the final result after the rotations needs to be divided by 64. The easiest way to divide the 24-bit result by 64 is to shift *left* twice, and keep the upper bits. In the macro below, "C" is the "C"enter, and Mij is some element in the rotation matrix. * Fix sign and divide by 64 DIVFIX MAC ;If Mij<0 subtract 256*C LDY MULTLO1 ;If C<0 subtract 2^16*Mij BPL POSM STA TEMP3 LDA TEMP2 SEC SBC ]1 ;Center STA TEMP2 LDA TEMP3 SBC ]1+1 ;high byte POSM LDY ]1+1 BPL DIV SEC SBC MULTLO1 ;Subtract Mij DIV ASL TEMP1 ;Divide by 64 ROL TEMP2 ROL ASL TEMP1 ROL TEMP2 ROL LDY TEMP2 <<< ;.A, .Y = final result * Main routine GLOBROT ENT Recall that there are two ways to look at multiplying a matrix times a vector. The usual way is "row times column"... and that is exactly how this routine works. Why not do the "column times vector element" method, which we said offers this great computational advantage? Doing things that way means calculating [A11] [B12] [C13] [D21]*CX + [E22]*CY + [F23]*CZ [G31] [H32] [I33] where the A11 etc. are the columns of the rotation matrix. The argument was that CX could be set up in zero page once, and then multiplications could be done three at a time. Why not just do that? Well, the most basic answer is that I had already written this routine before I noticed the multiplication trick. So I could have rewritten it using the above method, or modify what was already in place. Since CX CY and CZ are all 16-bit signed numbers, I chose to do the multiplication savings using A11 etc. as the zero page variable (A11*CX = A11*CXLO + 256*A11*CXHI), so the multiplications are done two at a time. I think I also decided that the total cycle savings in rewriting the routine would have been miniscule compared with the total routine time, and that the whole global rotation time was small compared with the local rotations and especially the polygon plotting. In sum, doing a total rewrite didn't seem to be worth the effort. Nevertheless, this routine can certainly be made faster. Another item for improvement is to do the division by 64 only at the very end of the calculation (the routines below multiply, then divide by 64, then add; it would be better to multiply, add, then after all additions divide by 64). Who knows what I was thinking at the time?! Oh well, that's why God invented version numbers :). Here is the main routine loop: GLOOP DEY STY COUNT [copy object centers to zero page for fast access] This part multiplies row 1 of the rotation matrix times the position vector, and stores the new result in CX, the x-coordinate of the object center. LDA A11 ;Row1 LDX B12 LDY C13 JSR MULTROW ;Returns result in .X .A = lo hi LDY COUNT STA (CXHI),Y TXA STA (CXLO),Y [Do the same thing for row 2 and the y-coordinate] [Do the same thing for row 3 and the z-coordinate] The routine then loops around back to GLOOP, until all points have been rotated. The important work of this routine is done in MULTROW: * * Multiply a row by CX CY CZ * (A Procedure to save at least a LITTLE memory...) * M1 EQU CXSGN ;CXSGN etc. no longer used. M2 EQU CYSGN M3 EQU CZSGN MULTROW STA M1 STX M2 STY M3 This next part checks to make sure A is not zero; as was pointed out in the discussion of the fast multiply, A=0 will fail (ask me how I know). TAY BEQ :SKIP1 It is all pretty straightforward from here: compute M1*CX, adjust for signed values, and divide by 64 (this is the divide by 64 that it would have been better to leave for later). Since zero-page is already set up, the second multiplication is done with a quick multiply. LDY CX+1 >>> MULTAY,TEMP2 STA TEMP3 LDY CX >>> QMULTAY,TEMP1 CLC ADC TEMP2 STA TEMP2 LDA TEMP3 ADC #00 >>> DIVFIX,CX ;Adjust result and /64 :SKIP1 STY TM1 STA TM1+1 The next two parts just calculate M2*CY and M3*CZ, adding to TM1 and TM1+1 as they go, and the routine exits. ROTPROJ ------- Now we move on to perhaps THE core routine. In general, *a whole lot* of object points need to be rotated and projected, so this routine needs to blaze. Remember that there are two rotations in the "world" equation: local rotations to get an object pointing in the right direction, and another rotation to figure out what it looks like relative to us. This routine can therefore just rotate (for the first kind of rotation), or rotate and project (for the second kind). The idea is to pass in a list of points, tell it how many points to rotate (and project), and tell it where to store them all! * * ROTPROJ -- Perform local rotation and project points. * * Setup needs: * Rotation matrix * Pointer to math table (MATMULT = $AF-$B0) * Pointers to point list (P0X-P0Z = $69-$6E) * Pointers to final destinations (PLISTYLO ... = $BD...) * (Same as used by POLYFILL) * Pointers to lists of centers (CXLO CXHI... = $A3-$AE) * .Y = Number of points to rotate (0..N-1) * .X = Object center index (index to center of object) * * New addition: * C set means rotate and project, but C clear means just * rotate. If C=0 then need pointers to rotation * destinations ROTPX,ROTPY,ROTPZ=$59-$5E. * Recall that the rotation matrix is offset by 64 -- instead of ranging from -1..1, all elements range from -64..64. As in the case of GLOBROT above, the final result needs to be divided by 64. Also recall the result from the discussion of signed multiplication: if one set of numbers is between -64..64, and the other between -96..95, then all combinations a+b (and a-b) generate a unique 8-bit number. In other words, only one table is needed. Next, think about rotations for a moment: rotations do not change lengths (if you rotate, say, a pencil, it doesn't magically get longer). This has two important implications. The first is that we know a priori that the result of this rotation will be an *8-bit* result, if the starting length was an 8-bit number. Thus we need only one table, and just two table lookups -- one for f(a+b) and the other for f(a-b). Since the upper 8-bits are irrelevant, the divide by 64 can be incorporated directly into the table of f(x). The second implication is that the _length_ of the point-vectors cannot be more than 128. This is because if we ignore the upper 8-bits, we don't capture the sign of the result. Consider: each x, y, and z coordinate of a point can be between -96..95. A point like (80,80,80) has length sqrt(3*80^2) = 138.56. So if some rotation were to rotate that vector onto the x-axis it would have a new coordinate of (138,0,0). Although this is an 8-bit number, it is not an eight-bit signed number -- the rotation could just as easily put the coordinate at (-138,0,0), which requires extra bits. So some care must be taken to make sure that object points are not more than 128 units away from the object center. As was pointed out earlier, we can save a lot of time with the matrix multiply by doing the multiplication as column-times-element, instead of the usual row-times-column: column 1 of the matrix times Px plus column 2 times Py plus column 3 times Pz, where (Px,Py,Pz) is the vector being rotated. Here is the entire signed multiplication routine (with divide by 64): SMULT MAC ;Signed multiplication STA MATMULT EOR #$FF CLC ADC #01 STA AUXP LDA (MATMULT),Y SEC SBC (AUXP),Y <<< And this is the Quick Multiply, used after the zero-page variables have been set up: QMULT MAC ;Multiplication that assumes LDA (MATMULT),Y ;pointers are already initialized SEC SBC (AUXP),Y <<< Pretty cool, eh? A 12-cycle signed multiply and divide. :) On to the routine: ROTLOOP is the main loop. It first rotates the points. If the carry was set then it adds the points to the object center and projects them. ROTLOOP LDY COUNT BEQ UPRTS DEY STY COUNT LDA (P0X),Y BNE :C1 STA TEMPX STA TEMPY STA TEMPZ BEQ :C2 :C1 LDY A11 ;Column 1 >>> SMULT STA TEMPX LDY D21 >>> QMULT STA TEMPY LDY G31 >>> QMULT STA TEMPZ The above multiplies the x-coordinate (P0X) times the first column of the rotation matrix. Note the check for P0X=0, which would otherwise cause the multiplication to fail. Note that there is one SMULT, which sets up the zero-page pointers, followed by two QMULTS. The result in TEMPX, TEMPY, TEMPZ will be added to subsequent column multiplies. And that's the whole routine! After the third column has been multiplied, we have the rotated point. If projections are taking place, then the point needs to be added to the center and projected. The centers are 16-bit signed coordinates, but so far the points are just 8-bit signed coordinates. To make them into 16-bit signed coordinates we need to add a high byte of either 00, for positive numbers, or $FF, for negative numbers (e.g. $FFFE = -2). In the code below, .Y is used to hold the high byte. ADDC LDY #00 CLC ADC TEMPZ BPL :ROTCHK DEY ;Sign :ROTCHK LDX ROTFLAG BMI :POS1 LDY COUNT ;If just rotating, then just STA (ROTP0Z),Y ;store! LDA TEMPY STA (ROTP0Y),Y LDA TEMPX STA (ROTP0X),Y JMP ROTLOOP :POS1 CLC ;Add in centers ADC CZ STA TEMPZ TYA ADC CZ+1 STA TEMPZ+1 ;Assume this is positive! Remember that only objects which are in front of us -- which have positive z-coordinates -- are visible. The projection won't work right if any z-coordinates are negative. Thus the above piece of code doesn't care about the sign of the z-coordinate. Two similar pieces of code add the x/y coordinate to the x/y center coordinate. The signs of the result are stored in CXSGN and CYSGN, for use below. Recall how projection works: accuracy for small values of Z, but speed for high values of Z. If the z-coordinate is larger than 8-bits, we just shift all the coordinates right until z is 8-bits. Since the X and Y coordinates might be negative, we need the sign bits CXSGN and CYSGN; they contain either 00 or $FF, so that the rotations will come out correctly (negative numbers will stay negative!). LDA TEMPZ+1 BEQ PROJ :BLAH LSR ;Shift everything until ROR TEMPZ ;Z=8-bits LSR CXSGN ROR TEMPX+1 ;Projection thus loses accuracy ROR TEMPX ;for far-away objects. LSR CYSGN ROR TEMPY+1 ;(Big whoop) ROR TEMPY TAX BNE :BLAH Finally comes the projection itself. It just follows the equations set out earlier, in Section 3. PROJTAB is the table of d/z. It turns out that there are a lot of zeros and ones in the table, so those two possibilities are treated as special cases. The PROJTAB value is used in the zero-page multiplication pointers, to again make multiplication faster. By calculating both the x- and y-coordinates simultaneously, the pointers may be reused even further. After multiplying, the coordinates are added to the screen offsets (XOFFSET and YOFFSET), so that 0,0 is at the center of the screen (i.e. XOFFSET=160 and YOFFSET=100). These values are changeable, so the screen center may be relocated almost anywhere. The final coordinates are 16-bit signed values. Finally, there is the other core routine: POLYFILL, the polygon renderer. POLYFILL -------- POLYFILL works much like the old Polygonamy routine. Each object consists of a list of points. Each face of the object is some subset of those points. POLYFILL polygons are defined by a set of indices into the object point list -- better to copy an 8-bit index than a 16-bit point. The index list must go counter-clockwise around the polygon, so the hidden-face routine will work correctly. To draw the polygon it simply starts at the bottom, calculating the left and right edges and filling in-between. POLYFILL also greatly improves upon the old routine. It can draw to bitmaps in any bank. It is very compact. It deals with 16-bit signed coordinates, and can draw polygons which are partially (or even wholly) off-screen. It also correctly draws polygons which overlap (in a very sneaky way). Keep in mind that this routine needs to be just balls-busting fast. A typical object might have anywhere from 3-10 visible polygons, and a typical polygon might be 100-200 lines high. Just saving a few tens of cycles in the main loop can translate to tens of thousands of cycles saved overall. By the same token, there's no need to knock ourselves out on routines which aren't part of the main loop; saving 100 cycles overall is chump change. Finally, since this is a library, there are some memory constraints, and some of the routines are subsequently less efficient than they would otherwise be. Again, though, those few added cycles are all lost in the noise when compared with the whole routine. There is a whole a lot of code, so it's time to just dive in. First, * * Some tables * XCOLUMN ;xcolumn(x)=x/8 ]BLAH = 0 LUP 32 DFB ]BLAH,]BLAH,]BLAH,]BLAH DFB ]BLAH,]BLAH,]BLAH,]BLAH ]BLAH = ]BLAH+1 --^ * Below are EOR #$FF for merging into the bitmap, instead * of just ORAing into the bitmap. LBITP ;Left bit patterns LUP 32 * DFB $FF,$7F,$3F,$1F,$0F,$07,$03,$01 DFB 00,$80,$C0,$E0,$F0,$F8,$FC,$FE --^ RBITP ;right bit patterns LUP 32 * DFB $80,$C0,$E0,$F0,$F8,$FC,$FE,$FF DFB $7F,$3F,$1F,$0F,$07,$03,$01,$00 --^ I will put off discussion of the above tables until they are actually used. It is just helpful to keep them in mind, for later. Next up is the fill routine. The polygonamy fill routine looked like STA BITMAP,Y STA BITMAP+8,Y STA BITMAP+16,Y ... There were a total of 25 of these -- one for each row. Then there were two bitmaps, so 50 total. Each bitmap-blaster was page-aligned. Fills take place between two columns on the screen. By entering the above routine at the appropriate STA xxxx,Y, the routine could begin filling from any column. By plopping an RTS onto the appropriate STA the routine could exit on any column. After filling between the left and right columns, the RTS was restored to an STA xxxx,Y. Although the filling is very fast, the routine is very large, locked to a specific bitmap, and a bit cumbersome in terms of overhead. So a new routine is needed, and here it is: * * The fill routine. It just blasts into the bitmap, * with self-modifying code determining the entry and * exit points. * FILLMAIN ]BLAH = 0 ;This assembles to LUP 32 ;LDY #00 STA (BPOINT),Y LDY #]BLAH ;LDY #08 STA (BPOINT),Y STA (BPOINT),Y ;... ]BLAH = ]BLAH+8 ;LDY #248 STA (BPOINT),Y --^ INC BPOINT+1 ;Advance pointer to column 32 COL32 LDY #00 STA (BPOINT),Y LDY #08 ;33 STA (BPOINT),Y LDY #16 STA (BPOINT),Y LDY #24 STA (BPOINT),Y LDY #32 STA (BPOINT),Y LDY #40 ;37 STA (BPOINT),Y LDY #48 STA (BPOINT),Y LDY #56 ;Column 39 STA (BPOINT),Y FILLEND RTS ;64+256=320 FILL JMP FILLMAIN ;166 bytes As before, self-modifying code is used to determine the entry and exit points. As you can see, compared with the polygonamy routine this method takes 3 extra cycles per column filled. But now there is just one routine, and it can draw to any bitmap. As an added bonus, if we do the RTS just right, then (BPOINT),Y will point to the ending-column (also note the INC BPOINT+1 in the fill code, for when column 32 is crossed). There are two parts to doing the filling. The above fill routine fills 8 bits at a time. But the left and right endpoints need to be handled specially. The old routine had to recalculate those endpoints; this method gets it automatically, and so saves some extra cycles in overhead. In the very worst case, the old routine takes 200 cycles to fill 40 columns; the new routine takes 325 cycles. The main routine below takes around 224 cycles per loop iteration. Ignoring the savings in overhead (a few tens of cycles), this means that in the absolute worst case the old method will be around 20% faster. For typical polygons, they become comparable (5% or so). With all the advantages the new routine offers, this is quite acceptable! A few extra tables follow: LOBROW DFB 00,64,128,192,00,64,128,192 (and so on)... HIBROW DFB 00,01,02,03,05,06,07,08 etc. COLTAB ;coltab(x)=x*8 DFB 0,8,16,24,32,40,48,56,64,72,80 etc. LOBROW and HIBROW are used to calculate the address of a row in the bitmap, by adding to the bitmap base address. COLTAB then gives the offest for a given column. Thus the address of any row,column is quickly calculated with these tables. Finally, the next two tables are used to calculate the entry and exit points into the fill routine. * * Table of entry point offsets into fill routine. * The idea is to enter on an LDY * FILLENT DFB 0,4,8,12,16,20,24,28,32,36,40 ;0-10 DFB 44,48,52,56,60,64,68,72,76,80 ;11-20 DFB 84,88,92,96,100,104,108,112 ;21-28 DFB 116,120,124,128 ;29-32 DFB 134 ;Skip over INC BPOINT+1 DFB 138,142,146,150,154,158 ;34-39 DFB 162 ;Remember that we use FILLENT+1 * * Table of RTS points in fill routine. * The idea is to rts after the NEXT LDY #xx so as to * get the correct pointer to the rightmost column. * * Thus, these entries are just the above +2 * FILLRTS DFB 2,6,10,14,18,22,26,30,34,38,42 ;0-10 DFB 46,50,54,58,62,66,70,74,78,82 ;11-20 DFB 86,90,94,98,102,106,110,114,118,122,126 DFB 132 ;Skip over INC DFB 136,140,144,148,152,156,160 ;33-39 All that's left now is the main code routines. Before drawing the polygon there is some setup stuff that needs to be done. The lowest point needs to be found, and the hidden face check needs to be done. The hidden face check is simply: if we can't draw this polygon, then it is hidden! As was said before, the list of point indices moves around the polygon counter-clockwise when the polygon is visible. When the polygon gets turned around, these points will go around the polygon clockwise. What does this mean computationally? It means that the right side of the polygon will be to the LEFT of the left side, on the screen. A simple example: ____ \ / L \/ R When the above polygon faces the other way (into the monitor), R will be to the left of L. The polygon routine always computes R as the "right side" and L as the "left side" of the polygon; if, after taking a few steps, the right side is to the left of the left side, then we know that the polygon is turned around. The hidden face calculation just computes the slopes of the left and right lines -- which we need to do to draw the polygon anyways -- and takes a single step along each line. All that is needed is to compare the left and right points (or the slopes), and either draw the polygon or punt. There are two sides to be dealt with: the left and right. Each of those sides can have either a positive or negative slope, therefore a total of four routines are needed. A second set of four routines is used for parts of the polygon that have y-coordinates larger than 200 or less than zero, i.e. off of the visible screen; these routines calculate the left and right sides as normal, but skip the plotting calculations. All four routines are similar. The differences in slope change some operations from addition to subtraction, and as was explained earlier affects whether the x=x+dx update comes before or after the plotting. It also affects the very last point plotted, similar to the way the x=x+dx update is affected. The truly motivated can figure out these differences, so I'll just go through one routine in detail: * Left positive, right negative * * End: left and right advance normally DYL3 >>> DYLINEL,UL3 LEFTPM >>> LINELP,LEFTMM JMP LCOLPM LOOP3 DEC YLEFT BEQ DYL3 UL3 >>> PUPDATE,LEFTXLO;LEFTXHI;LEFTREM;LDY;LDXINT;LDXREM LCOLPM >>> LCOLUMN DEC YRIGHT BNE UR3 >>> DYLINER,UR3 RIGHTPM >>> LINERM,RIGHTPP JMP RCOLPM UR3 >>> MUPDATE,RIGHTXLO;RIGHTXHI;RIGHTREM;RDY;RDXINT;RDXREM RCOLPM >>> RCOLUMN >>> UPYRS,LOOP3 That's the whole routine, for a polygon which ends in a little peak, like /\ i.e. left line positive slope and right line negative. Let's read the main loop: LOOP3 Decrease the number of remaining points to draw in the left line; if the line is finished, then calculate the next line. PUPDATE: The "P" is for "Plus": compute x=x+dx for the left side. LCOLUMN: Calculate the left endpoint on the screen, and set up the screen pointer and the fill entry point. Decrease the number of remaining points in the right line; if finished, calculate the next line segment (note BNE). MUPDATE: "M"inus update, since slope is negative: calculate x=x-dx RCOLUMN: Calculate the right endpoint and fill column, do the fill, and plot the left/right endpoints. UPYRS: Update the bitmap pointer, and go back to LOOP3 So it's just what's been said all along: calculate the left and right endpoints, and fill. When a line is all done, the next side of the polygon is computed. DYL3 performs the computations for the left line segment. DYLINEL computes the value of DY. Note that DY always has the same sign, since the polygon is always drawn from the bottom-up. LINELP computes DX and DX/DY, the inverse slope. If DX is negative it will jump to LEFTMM. Since we are in the "plus-minus" routine, i.e. left side = plus slope, right side = minus slope, then it needs to jump to the "minus-minus" routine if DX has the wrong sign. LEFTMM stands for "left side, minus-minus", in just the same way that LEFTPM in the above code means "left side, plus-minus". The right line segment is similar. Note what happens next: the routine JMPs to LCOLPM. It skips over the PUPDATE at UL3. Similarly, the right line calculation JMPs over UR3 into RCOLPM. What this does is delay the calculation of x=x+dx or x=x-dx until *after* the plotting is done, to fill between the correct left and right endpoints. See the discussion of polygon fills in Section 3 for a more detailed explanation of what is going on here. Now let's go through the actual macros which make up the above routine. First, the routines to calculate the slope of the left line segment: * * Part 1: Compute dy * * Since the very last point must be plotted in a special * way (else not be plotted at all), ]1=address of routine * to handle last point. * DYLINEL MAC ;Line with left points BEGIN LDX LINDEX L1 DEC REMPTS ;Remaining lines to plot BPL L2 ;Zero is still plotted INC YLEFT LDA REMPTS CMP #$FF ;Last point BCS ]1 EXIT RTS The above code first checks to see if any points are remaining. If this is the last point, we still need to fill in the very last points. If the right-hand-side routine has already dealt with the last endpoint, REMPTS will equal $FF and the routine will really exit; in this case, the last part has already been plotted. L3 LDX NUMPTS L2 LDY PQ,X STY TEMP1 LDA (PLISTYLO),Y ;Remember, y -decreases- DEX BMI L3 ;Loop around if needed LDY PQ,X STY TEMP2 SEC SBC (PLISTYLO),Y BEQ L1 ;If DY=0 then skip to next point Next, it reads points out of the point queue (PQ) -- the list of point indices going around the polygon. X contains LINDEX, the index of the current point on the left-hand side. We then decrease this index to get the next coordinate on the left-hand side; this index is increased for right-hand side coordinates. That is, left lines go clockwise through the point list, and right lines go counter- clockwise. If DY=0, then we can just skip to the next point in the point list -- the fill routine is very good at drawing horizontal lines. Sharp-eyed readers may have noticed that only the low-bytes of the y-coordinates are used. We know ahead of time that DY is limited to an eight-bit number, and since we are always moving up the polygon we know that DY always has the same sign. In principle, then, we don't need the high bytes at all. In practice, though, the points can get screwed up (like if the polygon is very very tiny, and one of the y-coordinates gets rounded down). The JSR FLOWCHK below checks the high byte. STA LDY STA YLEFT STA DIVY STX LINDEX JSR FLOWCHK ;Just in case, check for dy < 0 BMI EXIT ;(sometimes points get screwed up) <<< The code for FLOWCHK is simply FLOWCHK LDY TEMP1 LDA (PLISTYHI),Y LDY TEMP2 SBC (PLISTYHI),Y RTS Why put such a tiny thing in a subroutine? Because I was out of memory. It was very annoying. Remember the rule though: a few occasional wasted cycles are a drop in the bucket. And, of course, this is why God invented version numbers. (It can be made more efficient, anyways -- as you might have guessed, this was kludged in at the last moment, long after the routine had been written). Okee dokee, the next part of computing the line slope is to compute DX. For this routine, the left line has positive slope. * * Part 2: Compute dx. If dx has negative the expected * sign then jump to the complementary routine. * * dx>0 means forwards point is to the right of the * current point, and vice-versa. * LINELP MAC ;Left line, dx>0 LDY TEMP2 LDA (PLISTXLO),Y ;Next point LDY TEMP1 SEC ;Carry can be clear if jumped to SBC (PLISTXLO),Y ;Current point STA DIVXLO LDY TEMP2 LDA (PLISTXHI),Y LDY TEMP1 SBC (PLISTXHI),Y BPL CONT1 ;If dx<0 then jump out JMP ]1 ;Entry address CONT1 STA DIVXHI LDA (PLISTXLO),Y ;Current point STA LEFTXLO LDA (PLISTXHI),Y STA LEFTXHI JSR DIVXY ;Returns int,rem = .X,.A JSR LINELP2 DONE2 <<< ;Now .X=low byte, .A=high byte ;of current point Pretty short. It first computes DX, and jumps to the complementary routine if DX has the wrong sign. Recall that DX can be 9-bits, so both the low and high bytes of PLISTX are used. The current line coordinate is stored in LEFTXLO/LEFTXHI, and DIVXY thencomputes DX/DY, both the integer part and remainder part in fixed 16-bit format. That is, the number returned is xxxxxxxx.xxxxxxxx i.e. 8 bit integer, 8 bit remainder (256*rem/DY). A few brief words should be said about DIVXY. It uses a table of logs to compute an estimate for DX/DY. It then multiplies that estimate by DY, and fixes up the estimate if it is too large or too small. At this point, it has the integer part N and the remainder R, i.e. DX/DY = N + R/DY. To compute R/DY it just uses the log tables again but with a different exponential table, since LOG(R) - LOG(DY) will always be *negative*. It then uses this estimate as the actual remainder -- it doesn't try to correct with a multiplication. By my calculations this can indeed cause a tiny little error, but only for very very special cases (extremely large lines/polygons), and it is something that is difficult to even notice. I am always happy to trade tiny errors for extra cycles. After DIVXY is all done the subroutine LINELP2 is used -- again, it was moved out to conserve memory. Recall that the first step needs to be of size dx/2. LINELP2 and siblings perform this computation: LINELP2 STX LDXINT CMP #$FF ;if dy=1 BNE :NOT1 ;then A=dx/2 LDA #00 ROR ;Get carry bit from dx/2 STA LDXREM LDA #$80 STA LEFTREM LDX LEFTXLO LDA LEFTXHI BCC :RTS ;And start from current point DIVXY sets .A to #$FF if DY=1 -- although you'd expect that $FF is a perfectly valid remainder, it turns out that it doesn't happen because of the way DIVXY computes remainders. If DY=1, then DX/DY is exactly DX. More importantly, this is the only case in which DX/DY might be 9-bits; any other DY will return an 8-bit value of DX/DY. For this reason it is handled specially. :NOT1 STA LDXREM LDA #$80 STA LEFTREM ;Initialize remainder to 1/2 TXA ;x=x-dxdy/2 LSR STA TEMP2 LDA LEFTXLO TAX ;.X = current point SEC SBC TEMP2 STA LEFTXLO LDA LEFTXHI BCS :DONE DEC LEFTXHI TAY ;Set/clear Z flag :DONE CLC :RTS RTS The above probably looks a little strange. The slope is positive, so we want to calculate x=x+dx/2, but the above calculates x=x-dx/2. The reason is that the main loop will add dx to it, so that the next iteration will take it out to +dx/2. Why do it on the next iteration? For the same reason the x=x+dx update is delayed until after the plotting: for left lines with positive slope, we always need to start at the left endpoint of each horizontal line segment; in this case, that left endpoint is exactly the starting point. Notice that before computing x-dx/2 the above routine puts LEFTXLO/LEFTXI -- the starting point -- in .X/.A. The plot/fill calculations are done using the point in .X/.A and so the plot will be done from the starting point. The above are the routines to calculate the slope of the line. They then jump over the next part: the part of the main loop which calculates x=x+dx. * * Next, the parts which update the X coordinates * for positive and negative dx (in real space, of * course -- the stored value of dx is always positive) * * Parameters passed in are: * ]1 = lo byte x coord * ]2 = high byte x coord * ]3 = remainder * ]4 = dy * ]5 = dx/dy, integer * ]6 = dx/dy, remainder * PUPDATE MAC ;dx>0 LDA ]3 CLC ADC ]6 STA ]3 LDA ]1 ;x=x+dx/dy ADC ]5 TAX STA ]1 BCC CONT INC ]2 ;High byte, x CLC CONT LDA ]2 <<< ;Carry is CLEAR on exit ;.X = lo byte xcoord ;.A = hi byte As you can see, it is a very simple 16-bit addition. As it says, it leaves carry clear and .X/.A = lo/hi for the column calculation: * * Compute the columns and plot the endpoints * * .X contains the low byte of the x-coord * .A was JUST loaded with the high byte * * Carry is assumed to be CLEAR * LCOLUMN MAC * LDA LEFTXHI BEQ CONT CMP #2 BCC CONT1 ASL BCS NEG POS LDA #$FF ;Column flag STA LCOL BNE DONE NEG LDA #00 ;If negative, column=0 STA LCOL STA LBITS ;(will be EOR #$FF'ed) LDA #4 ;Start filling at column 1 STA FILL+1 BNE DONE The routine first checks if the coordinate lies in one of the 40 columns on the screen, by checking the high byte. If the high byte is larger than one then the coordinate is way off the right side of the screen. If the left coordinate is past the right edge of the screen then there is nothing on-screen to fill, so LCOL is set to $FF as a flag. If the coordinate is negative then LCOL, the left column number, is set to 0, which will flag the plotting routine later on. LBITS will be explained later, and FILL is the entry point for the fill routine, consisting of a JMP xxxx instruction. Setting FILL+1 to 4 starts the fill routine at column one. Why not start at column zero, and let the fill routine take care of it? Because that requires a flag, and hence a check of that flag in the plotting routine. That flag check means a few extra cycles on each loop iteration, and actually screws up the plotting logic. For the relatively rare event of negative columns, this is totally not worth it. There is still another branch above -- if the high byte is equal to one. If it is, then we need to check if the x-coordinate is less than 320. If it is, then the bitmap pointer and column entry point need to be set up correctly. CONT1 CPX #64 ;Check X<320 BCS POS LDA #32 ;Add 32 columns! INC BPOINT+1 ;and advance pointer CONT ADC XCOLUMN,X STA LCOL TAY LDA LBITP,X STA LBITS ;For later use in RCOLUMN LDA FILLENT+1,Y ;Get fill entry address STA FILL+1 DONE <<< By using the ADC XCOLUMN,X instead of the simple LDA XCOLUMN,X we can just add the extra 32 columns in very easily. Remember that the routine is entered with C clear, and A=0 if X<256. This gives the column number, 0-39 (the XCOLUMN table is way up above). A table lookup is cheaper than dividing by eight. LBITP is a table which gives the bit patterns for use in plotting the endpoints -- the part of the line not drawn by the fill routine. Plotting this endpoint is put off until later, because, among other things, the left and right endpoints might share a column (at the tips of the polygon, or for a very skinny polygon); plotting now would hose other things on the screen. Finally, FILLENT is used to set the correct entry point into the fill routine, FILL. The right side of the polygon uses a similar set of routines to calculate slopes and do the line updating. The column calculation is similar at first, but does the actual plotting and filling: * * Note that RCOLUMN does the actual filling * * On entry, .X = lo byte x-coord, and .A was JUST loaded * with the high byte. * * Carry must be CLEAR. * RCOLUMN MAC * LDA RIGHTXHI BEQ CONT CMP #2 BCC CONT1 ASL BCS JDONE ;Skip plot+fill if x<0 POS LDX LCOL BMI JDONE ;Skip if lcol>40! LDY YCOUNT ;Plot the left edge LDA (FILLPAT),Y STA TEMP1 LDY COLTAB,X ;Get column index EOR (BPOINT),Y ;Merge into the bitmap AND LBITS ;bits EOR #$FF EOR TEMP1 ;voila! STA (BPOINT),Y LDA TEMP1 JSR FILL ;Just fill to last column JDONE JMP DONE If the right column is negative, then the polygon is totally off the left side of the screen. If the right coordinate is off the right side of the screen, and the left column is still on the screen, then we just fill all the way to the edge. The left endpoints are then merged into the bitmap -- merging will be discussed, below. UNDER LDY LCOL ;It is possible that, due to BMI DONE ;rounding, etc. the left column LDX #00 ;got ahead of the right column ;x=0 will force load of $FF SAME LDA RBITP,X ;If zero, they share column EOR LBITS STA LBITS LDY YCOUNT LDA (FILLPAT),Y STA TEMP1 LDX LCOL LDY COLTAB,X EOR (BPOINT),Y ;Merge AND LBITS EOR TEMP1 STA (BPOINT),Y ;and plot JMP DONE CONT1 CPX #64 ;Check for X>319 BCS POS LDA #32 ;Add 32 columns! CONT ADC XCOLUMN,X CMP LCOL ;Make sure we want to fill BCC UNDER BEQ SAME After the right column is computed it is compared to the left column. If they share a column (or if things got screwed up and the right column is to the left of the left column), then the left and right sides need to be combined together and merged into the bitmap. Again, merging is described below. Otherwise, it progresses normally: LDY RBITP,X STY TEMP1 LDX LCOL STA LCOL ;Right column LDY YCOUNT ;Plot the left edge LDA (FILLPAT),Y STA TEMP2 LDY COLTAB,X ;Get column index EOR (BPOINT),Y ;Merge into bitmap AND LBITS EOR TEMP2 STA (BPOINT),Y it uses LCOL as temporary storage for the right column, sets the fill pattern, and then plots the left edge. Recall that LCOLUMN set up the bitmap pointer, BPOINT, if the x-coord was larger than 255. COLTAB gives the offset of a column within the bitmap, i.e. column*8. The pattern is loaded and the left endpoint is then merged into the bitmap. What, exactly, is a 'merge'? Consider a line with LEFTX=5, i.e. that starts in the middle of a column. The left line calculation assumes that the left edge will be 00000111, and the fill takes care of everything to the right of that column. We can't just STA into the bitmap, because that would be bad news for overlapping or adjacent polygons. We also don't want to ORA -- remember that this is a pattern fill, and if that pattern has holes in it then this polygon will combine with whatever is behind it, and can again lead to poor results. What we really want to do is to have the pattern stored to the screen, without hosing nearby parts of the screen. A very crude and cumbersome way of doing this would be to do something like LDA BITP ;00000111 EOR #$FF AND (BITMAP) ;mask off the screen STA blah LDA BITP AND pattern ORA blah STA (BITMAP) That might be OK for some lame-o PC coder, but we're a little smarter than that I think. Surely this can be made more efficient? Of course it can, and stop calling me Shirley. It just requires a little bit of thinking, and the result is pretty nifty. First we need to specify what the problem is. There are three elements: the BITP endpoint bits, the FILLPAT fill pattern bits, and the BPOINT bitmap screen bits. What we need is a function which does bitmask pattern bitmap output ------- ------- ------ ------ 0 y x x 1 y x y In the above, 'y' represents a bit in the fill pattern and 'x' represents a bit on the screen. They might have values 0 or 1. If the bitmask (i.e. 00000111) has a 0 in it, then the bitmap bit should pass through. If the bitmask has a 1, then the *pattern* bit should pass through. A function which does this is (pattern EOR bitmap) AND (NOT bitmask) EOR pattern This first tells us that the bitmask should be 11111000 instead of 00000111. This method only involves one bitmap access, and doesn't involve any temporary variables. In short, it merges one bit pattern into another, using a mask to tell which bits to change. So let's look at that bitmap merge code again: LDY YCOUNT LDA (FILLPAT),Y STA TEMP2 LDY COLTAB,X ;Get column index EOR (BPOINT),Y ;Merge into bitmap AND LBITS EOR TEMP2 STA (BPOINT),Y And there it is. Pattern EOR bitmap AND not bitmask EOR pattern. Note that because the pattern is loaded first, .Y can be reused as an index into the bitmap, and X can stay right where it is (to be again used shortly). With these considerations, you can see that trying to mask off the bitmap would be very cumbersome indeed! Note that if the left and right endpoints share a column, their two bitmasks need to be combined before performing the merge. This just means EORing the masks together -- remember that they are inverted, so AND won't work, and EOR is used instead of ORA in case bits overlap (try a few examples to get a feel for this). Okay, let's remember where we are: the columns have all been computed, the fill entry points are computed, and the left endpoint has been plotted. We still have to do the fill and plot the right endpoint. LCOLUMN set the fill entry point, but RCOLUMN needs to set the exit point, by sticking an RTS at the right spot. LDY LCOL ;right column LDX FILLRTS,Y ;fill terminator LDA #$60 ;RTS STA FILLMAIN,X LDA TEMP2 ;Fill pattern JSR FILL EOR (BPOINT),Y AND TEMP1 ;Right bits EOR TEMP2 ;Merge STA (BPOINT),Y LDA #$91 ;STA (),Y STA FILLMAIN,X ;Fill leaves .X unchanged ;And leaves .Y with correct offset DONE <<< The fill routine doesn't change X, leaves (BPOINT),Y pointing to the last column, and leaves the fill pattern in A. The right side of the line can then be immediately plotted after filling, and the STA (),Y instruction can be immediately restored in the fill routine. And that's the whole routine! After plotting, the loop just takes a step up the screen and loops around. Since BPOINT, the bitmap pointer, may have been changed (by LCOLUMN, or by the fill routine), the high byte needs to be restored. * * Finally, we need to decrease the Y coordinate and * update the pointer if necessary. * * Also, since BPOINT may have been altered, we need * to restore the high byte. * * ]1 = loop address * UPYRS MAC LDA ROWHIGH STA BPOINT+1 DEC BPOINT DEC YCOUNT BMI FIXIT JMP ]1 FIXIT LDA #7 STA YCOUNT LDY YROW BEQ EXIT ;If y<0 then exit DEY STY YROW ORA LOBROW,Y STA BPOINT LDA HIBROW,Y CLC ADC BITMAP STA BPOINT+1 STA ROWHIGH JMP ]1 EXIT RTS <<< Of course, the 64 bitmap is made up of 'rows of eight', i.e. after every eight rows we have to reset the bitmap pointer. If we move past the top of the screen (i.e. Y<0) then there is no more polygon left to draw. Otherwise the new bitmap location is computed, by adding the offset to BITMAP, the variable which lets the routine draw to any bitmap (BITMAP contains the location in memory of the bitmap). Note also that the high byte is stored to ROWHIGH -- ROWHIGH is what the routine immediately above uses to restore the high byte of BPOINT! And that, friends, is the essence of the polygon routine, and concludes the disassembly of the 3d library routines. --------- Section 4: Cool World --------- Now that we have a nice set of routines for doing 3D graphics, it's time to write a program which actually uses those routines to construct a 3D world. That program is Cool World. Not only does it demonstrate the 3d library, but it makes sure that all of the library routines actually work! Although the algorithms for constructing a world were discussed way back in Section 2, it's probably a good idea to very quickly summarize those algorithms, just as a reminder. After that, this section will go through the major portions of the Cool World code, and explain how (and why) things work. In the 3d world, each object has certain properties. It has a position in the world. It also has an orientation; it points in some direction. It might also have some velocity, and other stuff, but all we'll worry about in Cool World is position and orientation. Each object is defined by a series of points which are defined relative to the object center (the object's position). The orientation tells how to rotate those points about the center -- to get the object rotating in the right direction, we just apply a rotation matrix to the object. In Cool World, we will be the only thing moving around in the world -- all the other objects are fixed in position, although some of them are rotating. To figure out how objects look from our perspective, we first figure out if the object is visible by using the equation R'(Q-P). P is our position, and is subtracted from the object position Q to get its relative position. R' then rotates the world around us, to get it pointing in the direction of our orientation vector. If the object is visible, then we compute the full object by using the equation R' (M X + Q-P). X represents the points of an object. M is the rotation matrix to get the object pointing in the right direction. The rotated points are then added to the relative center (Q-P) and rotated according to our orientation. The points are then projected. The job of Cool World and the 3D library is to implement that tiny little equation up there. Once all the visible objects are rotated and projected, they need to be depth-sorted to figure out what order to draw them in. Each object is then drawn, in order, and around and around it goes. With this in mind: * * Cool World (Polygonamy II: Monogonamy) * * Just yer basic realtime 3d virtual world program for * the C64. This program is intended to demonstrate the * 3d library. * * SLJ 8/15/97 * The code starts with a bunch of boring setup stuff, but soon gets to the * Main loop MAINLOOP which resets a few things, swaps the buffers, and clears the new draw buffer. The screen clearing is done in a very brain-dead way; as you may recall, polygonamy only cleared the parts of the screen that needed clearing. I decided this time around that this was more trouble than it was worth, especially with the starfields hanging around. Still, a smart screen clear might keep track of the highest and lowest points drawn, and only clear that part of the screen. Once the preliminaries are over with, the main part gets going: :C1 JSR READKEY JSR UPD8POS ;Update position if necessary JSR UPD8ROT ;Update second rotation matrix JSR PLOTSTAR JSR SORT ;Construct visible object lst First the keyboard is scanned. If a movement key is pressed, then the stars are updated; if we turn, then our orientation vector is updated via calls to ACCROTX, ACCROTY, and ACCROTZ. The orientation vector is very easy to keep track of. We enter the world pointing straight down the z-axis, so initially the orientation vector is V=(0,0,1). Now it gets a little trickier. Let's say we turn to the left. We can think of this two ways: our orientation vector rotates left, or the world rotates around us to the right. The problem here is that these aren't the same thing! "What?!" you may say. The problem here is not the single rotation, but the rotations that come after it. Imagine a little x-y-z coordinate axis coming out of your shirt, with the z-axis pointing straight ahead of you. When you turn in some direction, that coordinate system needs to turn with you, because rotations are always about those axis. In other words, the world needs to rotate around those axis; the *inverse* of those rotations gives the orientation vector. What would happen if we instead rotated the orientation vector? Imagine a pencil coming out of your chest -- that's the orientation vector. Now turn it to the left, with you still looking at the monitor. Once it's out at 45 degrees or so, let's say we started spinning around the z-axis. Well that z-axis is still facing towards the monitor, and the pen rotates about THAT axis, drawing a circle around the monitor as it goes around (or, if you like, making a cone-shape with the tip of the cone at your chest). If, on the other hand, YOU turn left, and then start spinning, you can see that something very different happens -- the monitor now circles around the pencil. The point is that we need to rotate the world around us. Specifically, by using ACCROTX and friends we maintain a rotation matrix which will rotate the world around us. But we still need an orientation vector, to tell us what direction to move in when we move forwards or backwards. That rotation matrix tells us how to rotate the whole world so that it "points in our direction"; therefore, if we un-do all of those rotations, we can figure out what direction we were originally pointing in. In other words, the inverse rotation matrix, when applied to the original orientation vector (0,0,1), gives the current orientation vector. Recall that inverting a rotation matrix is really easy: just take the transpose. And what happens when apply this new rotation matrix to V=(0,0,1)? With a simple calculation, you can see that M V just gives the third row of M, for any matrix M. So, let's say that we've calculated the accumulated rotation matrix R'. To invert it just take the transpose, giving R. The orientation vector is thus RV: the third row of R. But the third row of R is just the third *column* of R', since R' is the transpose of R. Therefore, the orientation vector -- the direction we are currently pointing in -- is simply the third column of the accumulated rotation matrix. We don't have to do any extra work at all to compute that orientation vector; it just falls right out of the calculation we've already done! So, to summarize: Keys are read in. Any rotations will cause an update of the rotation matrix. Our orientation vector is simply the third column of that matrix. (Note that the same logic holds for other objects, if they happen to move). If we move forwards or backwards, then the stars are updated and a flag is set for UPD8POS. JSR UPD8POS ----------- Truth in advertising: UPD8POS updates positions. Specifically, our position, since we are the only ones moving. The world equation is R' (MX + Q-P), which we can write as R'MX + R'(Q-P). Q is the position of an object; changing our position P just changes Q-P. In Cool World there's no need to keep track of our position in the world; we might as well just keep track of the relative positions of objects. Instead of updating a position P when we move, Cool World just changes the locations Q of all the objects. Since objects never move, the quantity R'(Q-P) won't change as long as WE don't move. UPD8POS calculates that quantity, and does so only if we move or rotate -- if an object were to move, this quantity would have to be recalculated for that object. Note that the calculation is done using GLOBROT, the lib3d routine to rotate the 16-bit centers. Later on, those rotated 16-bit centers will be added to R'MX, the rotated object points, if an object is visible. Cool World lets you travel at different speeds by pressing the keys 0-4. Different speeds are easy to implement. Recall that the rotation matrix is multiplied by 64. Grabbing the third column of that matrix gives a vector of length 64 (rotations don't change lengths, so multiplying a rotation matrix times (0,0,1) must always return a vector of the same length). By changing the length of that orientation vector -- for example, by multiplying or dividing by two -- we can change the amount which we subtract from the object positions (or add to our position, if you like), and hence the speed at which we move through the world. JSR UPD8ROT ----------- Since some of the objects need to rotate, a second set of rotation matrices is maintained. As far as the lib3d routines are concerned there is only one matrix, in zero page. Therefore the global rotation matrix needs to be stored in memory before the new matrices are calculated. The new rotation matrices are calculated using CALCMAT, the routine which only needs the x, y, and z angles. In fact, only one matrix is calculated using this routine. The other matrices are just permutations of that one rotation matrix; for example, the transpose (the inverse, remember) is a new rotation matrix. We can also get new matrices by, say, a cyclic permutation of the rows -- this is the same as permuting the coordinates (i.e. x->y y->z z->x), which is the same as rotating the whole coordinate system. As long as the determinant is one, the new matrix will preserve areas. Once the new matrix is calculated, it is stored in memory for later use by the local rotation routines (ROTPROJ). To get a different matrix, all we have to do is copy the saved matrix to zero page in a different way. That is, we don't have to store ALL of the new rotation matrices in memory; we can do any row swapping and such on the fly, when copying from storage into zero page. Note that by using CALCMAT we can use large angle increments, to make the objects appear to spin faster. ACCROTX and friends can only accumulate by a fixed angle amount. JSR PLOTSTAR ------------ PLOTSTAR is another imaginatively named subroutine which, get this, plots stars. The starfield routine is discussed elsewhere in this issue. JSR SORT -------- Finally, this routine -- you won't believe this -- sorts the objects according to depth. The sort is just a simple insertion sort -- the object z-coordinates are traversed, and each object is inserted into an object display list. The list is searched from the top down, and elements are bumped up the list until the correct spot for the new object is found. The object list is thus always sorted. This kind of sort is super-easy to implement in assembly, and "easy" is a magic word in my book. SORT also does the checking for visibility. As always, the easiest and stupidest approach is taken: if the object lies in a 45-degree wedge in front of us, then it is visible. This wedge is easiest because the boundaries are so simple: the lines z=x, z=-x, z=y, and z=-y. If the object center lies within those boundaries -- if -x < z < x and -y < z < y -- then the object is considered to lie in our field of vision. Since we don't want to view objects which are a million miles away, a distance restriction is put on the z-coordinate as well: if z>8192 then the object is too far away. We also don't want to look at objects which are too close -- in particular, we don't want some object points in front of us and some behind us -- so we need a minimum distance restriction as well. Since the largest objects have points 95 units away from the center, checking for z>100 is pretty reasonable. By the way -- that "distance" check only checks the z-coordinate. The actual distance^2 is x^2 + y^2 + z^2. This leads to the "peripheral vision" problem you may have noticed -- that an object may become visible towards the edge of the screen, but when you turn towards it it disappears. So, once the centers have been rotated, the extra rotation matrix calculated, and the object list constructed, all that is left to do is to plot any visible objects: :LDRAW LDY NUMVIS ;Draw objects from object list BEQ :DONE LDA OBLIST-1,Y ;Get object number ASL TAX LDA VISTAB,X STA :JSR+1 LDA VISTAB+1,X STA :JSR+2 :JSR JSR DRAWOB1 DEC NUMVIS BNE :LDRAW :DONE JMP MAINLOOP VISTAB DA DRAWOB1 DA DRAWOB2 ... The object number from the object list is used as an index into VISTAB, the table of objects, and each routine is called in order. It turns out that in Cool World there are two kinds of objects: those that rotate, and those that don't. What's the difference? Recall the half of the world equation that we haven't yet calculated: R'MX. Objects that don't rotate have no rotation matrix M. And for objects that do rotate, I just skipped applying the global rotation R' (these objects are just spinning more or less randomly, so why bother). This becomes very apparent when viewing the Cobra Mk III -- no matter what angle you look at it from, it looks the same! A full implementation would of course have to apply both R' and M. Anyways, with this in mind, let's take a closer look at two representative objects: the center tetrahedron, which doesn't rotate, and one of the stars, which does. * * Object 1: A simple tetrahedron. * * (1,1,1) (1,-1,-1) (-1,1,-1) (-1,-1,1) TETX DFB 65,65,-65,-65 TETY DFB 65,-65,65,-65 TETZ DFB 65,-65,-65,65 These are the points that define the object, relative to its center. As you can see, they have length 65*sqrt(3) = 112. This is within the limitation discussed earlier, that all vectors must have length less than 128. Other objects -- for example, the cubes -- have coordinates like (55,55,55), giving length 55*sqrt(3) = 95. * Point list FACE1 DFB 0,1,2,0 FACE2 DFB 3,2,1,3 FACE3 DFB 3,0,2,3 FACE4 DFB 3,1,0,3 A tetrahedron has four faces, and the above will tell POLYFILL how to connect the dots. They go around the faces in counter-clockwise order. Note that the first and last point are the same; this is necessary because of the way POLYFILL works. When drawing from the point list, POLYFILL only considers points to the left or to the right. When it gets to one end of the list, it jumps to the other end of the list, and so always has a point to the left or to the right of the current spot in the point list. (Bottom line: make sure the first point in the list is also the last point). TETCX DA 0 ;Center at center of screen TETCY DA 00 TETCZ DA INITOFF ;and back a little ways. OB1POS DFB 00 ;Position in the point list OB1CEN DFB 00 ;Position of center TETCX etc. are where the object starts out in the world. OB1POS is a leftover that isn't used. OB1CEN is the important variable here. The whole 3d library is structured around lists. All of the object centers are stored in a single list, and are first put there by an initialization routine. This list of object centers is then used for everything -- for rotations, for visibility checks, etc. OB1CEN tells where this object's center is located in the object list. This next part is the initialization routine, called when the program is first run: * * ]1, ]2, ]3 = object cx,cy,cz * INITOB MAC LDA ]1 STA C0X,X LDA ]1+1 STA C0X+256,X LDA ]2 STA C0Y,X LDA ]2+1 STA C0Y+256,X LDA ]3 STA C0Z,X LDA ]3+1 STA C0Z+256,X <<< INITOB1 ;Install center into center list LDX NUMOBJS STX OB1CEN >>> INITOB,TETCX;TETCY;TETCZ INC NUMOBJS RTS As you can see, all it does is copy the initial starting point, TETCX, TETCY, TETCZ, into the object list, and transfer NUMOBJS, the number of objects in the center list, into OB1CEN. It then increments the number of objects. In this way objects can be inserted into the list in any order, and a more or less arbitrary number of objects may be inserted. (Of course, REMOVING an object from the list can be a little trickier). Now we move on to the main routine: * * Rotate and draw the first object. * * PLISTXLO etc. already point to correct offset. * ]1,]2,]3 = object X,Y,Z point lists SETPOINT MAC LDA #<]1 ;Set point pointers STA P0X LDA #>]1 STA P0X+1 LDA #<]2 STA P0Y LDA #>]2 STA P0Y+1 LDA #<]3 STA P0Z LDA #>]3 STA P0Z+1 <<< SETPOINT simply sets up the point list pointers P0X P0Y and P0Z for the lib3d routines. It is a macro because all objects need to tell the routines where their actual x, y, and z point coordinates are located, so this little chunk of code is used quite a lot. The routine called by the main loop now begins: DRAWOB1 LDX OB1CEN ;Center index ROTTET ;Alternative entry point >>> SETPOINT,TETX;TETY;TETZ LDY #4 ;Four points to rotate SEC ;rotate and project JSR ROTPROJ The second entry point, ROTTET, is used by other objects which are tetrahedrons but not object #1 (in Cool World there is a line of tetrahedrons, behind our starting position). This of course means there won't be a lot of duplicated code. Next the points need to be rotated and projected. SETPOINT sets up the list pointers for ROTPROJ, .Y is set to the number of object points in those lists, and C is set to tell ROTPROJ to rotate and project. If we were just rotating (C=clear, i.e. to calculate MX), we would have to set up some extra pointers to tell ROTPROJ where to store the rotated points; those rotated points can then be rotated and projected, as above. Note that the rotation matrix is assumed to already be set up in zero page. Once the points have been rotated and projected, all that remains is to... * Draw the object SETFILL MAC ;]1 = number of points ;]2 = Face point list ;]3 = pattern LDY #]1 L1 LDA ]2,Y STA PQ,Y DEY BPL L1 LDA #<]3 STA FILLPAT LDA #>]3 STA FILLPAT+1 LDX #]1 <<< Again we have another chunk of code which is used over and over and over. SETFILL just sets stuff up for POLYFILL, the polygon drawing routine. It first copies points into the point queue; specifically, it copies from lists like FACE1, above, into PQ, the point queue used by POLYFILL. It then sets FILLPAT, the fill-pattern pointer, to the 8 bytes of data describing the fill pattern. Finally it loads .X with the number of points in the point queue, for use by POLYFILL. All that remains is to call POLYFILL -- all the other pointers and such are set up, and POLYFILL does the hidden face check for us. DRAWTET ;Yet another entry point >>> SETFILL,3;FACE1;DITHER1 JSR POLYFILL >>> SETFILL,3;FACE2;ZIGZAG JSR POLYFILL >>> SETFILL,3;FACE3;CROSSSM JSR POLYFILL >>> SETFILL,3;FACE4;SOLID JMP POLYFILL And that's the whole routine! DITHER1, ZIGZAG etc. are just little tables of data that I deleted out, containing fill patterns. For example, SOLID looks like SOLID HEX FFFFFFFFFFFFFFFF i.e. eight bytes of solid color. The next object, a star, is more involved. Not only is it rotating, but it is also a concave object, and requires clipping. That is, parts of this object can cover up other parts of the object. We will therefore have to draw the object in just the right way, so that parts which are behind other parts will be drawn first. Compare with the way in which we depth-sorted the entire object list, to make sure far-off objects are drawn before nearby objects. When applied to polygons, it is called polygon clipping. Incidentally, I *didn't* do any clipping on the Cobra Mk III. Sometimes you will see what appear to be glitches as the ship rotates. This is actually the lack of clipping -- polygons which should be behind other polygons are being drawn in front of those polygons. There are two types of stars in Cool World, but each is constructed in a similar way. The larger stars are done by starting with a cube, and then adding an extra point above the center of each face. That is, imagine that you grab the center of each face and pull outwards -- those little extrusions are form the tines of the star. The smaller stars are similarly constructed, starting from a tetrahedron. The large stars therefore have six tines, and the smaller stars have four. Clipping these objects is really pretty easy, because each of those tines is a convex object. All we have to do is depth-sort the *tips* of each of the tines, and then draw the tines in the appropiate order. Recall that in the discussion of objects, in Section 2, it was pointed out that any concave object may be split up into a group of convex objects. That is basically what we are doing here. * * All-Stars: A bunch of the cool star things. * STOFF EQU 530 ;Offset unit STCX EQU -8000 ;Center STCY EQU 1200 STCZ EQU -400+INITOFF STAR1CX DA STCX STAR1CY DA STCY STAR1CZ DA STCZ OB7CEN DFB 00 STAR1X DFB 25,-25,25,-25,50,50,-50,-50 STAR1Y DFB -25,-25,25,25,-50,50,50,-50 STAR1Z DFB 25,-25,-25,25,-50,50,-50,50 S1T1F1 DFB 4,2,0,4 ;Star 1, Tine 1, face 1 S1T1F2 DFB 4,1,2,4 S1T1F3 DFB 4,0,1,4 S1T2F1 DFB 5,0,2,5 S1T2F2 DFB 5,3,0,5 S1T2F3 DFB 5,2,3,5 S1T3F1 DFB 6,2,1,6 S1T3F2 DFB 6,3,2,6 S1T3F3 DFB 6,1,3,6 S1T4F1 DFB 7,1,0,7 S1T4F2 DFB 7,0,3,7 S1T4F3 DFB 7,3,1,7 INITOB7 LDX NUMOBJS STX OB7CEN >>> INITOB,STAR1CX;STAR1CY;STAR1CZ INC NUMOBJS RTS As before, the above stuff defines the position, points, and faces of the object, and INITOB7 inserts it into the center list. The main routine then proceeds similarly: DRAWOB7 LDX OB7CEN ;Center index ROTSTAR1 JSR RFETCH ;Use secondary rotation matrix SETSTAR1 >>> SETPOINT,STAR1X;STAR1Y;STAR1Z LDY #8 ;Eight points to rotate SEC ;rotate and project JSR ROTPROJ Note the JSR RFETCH above. RFETCH retrieves one of the rotation matrices from memory, and copies it into zero page for use by ROTPROJ. Again, different rotation matrices may be constructed by performing this copy in slightly different ways. The tines must then be sorted, (and the accumulation matrix is restored to zero page), and once sorted the tines are drawn in order. The code below just goes through the sorted tine list, and calls :TINE1 through :TINE4 based on the value in the list. The sorting routine will be discussed shortly. JSR ST1SORT ;Sort the tines JSR IFETCH ;Restore accumulation matrix * Draw the object. In order to handle overlaps correctly, * the tines must first be depth-sorted, above. DRAWST1 LDX #03 ST1LOOP STX TEMP LDY T1LIST,X ;Sorted tine list BEQ :TINE1 DEY BNE :C1 JMP :TINE2 :C1 DEY BNE :TINE4 JMP :TINE3 :TINE4 >>> SETFILL,3;S1T4F1;SOLID JSR POLYFILL >>> SETFILL,3;S1T4F2;DITHER1 JSR POLYFILL >>> SETFILL,3;S1T4F3;DITHER2 JSR POLYFILL :NEXT LDX TEMP DEX BPL ST1LOOP RTS :TINE1 ...draw tine 1 in a similar way :TINE2 >>> SETFILL,3;S1T2F1;ZIGS ... :TINE3 >>> SETFILL,3;S1T3F1;BRICK ... T1LIST DS 6 ;Tine sort list T1Z DS 6 Now we get to the sorting routine. It depth-sorts the z-coordinates, so it actually needs the z-coordinates. Too bad we rotated and projected, and no longer have the rotated z-coordinates! So we have to actually calculate those guys. Now, if you remember how rotations work, you know that the z-coordinate is given by the third row of the rotation matrix times the point being rotated -- in this case, those points are the tips of the individual tines. Well, we know those points are at (1,-1,-1), (1,1,1), (-1,1,-1), and (-1,-1,1). Actually, they are in the same *direction* as those points/vectors, but have a different length (i.e. STAR1X etc. defines a point like 50,-50,-50). The lengths don't matter though -- whether a star is big or small, the tines are still ordered in the same way. And order is all we care about here -- which tines are behind which. So, calculating the third row times points like (1,-1,-1) is really easy. If that row has elements (M6 M7 M8), then the row times the vector just gives M6 - M7 - M8. So all it takes is an LDA and two SBC's to calculate the effective z-coordinate. Just for convenience I added 128 to the result, to make everything positive. After calculating these z-coordinates, they are inserted into a little list. That list is then sorted. Yeah, an insertion sort would have been best here, I think. Instead, I used an ugly, unrolled bubble-like sort. By the way, the cubic-stars are even easier to sort, since their tines have coordinates like (1,0,0), (0,1,0) etc. Calculating those z-coordinates requires an LDA, and nothing else! * * Sort the tines. All that matters is the z-coord, * thus we simply dot the third row of the matrix * with the tine endpoint, add 128 for convenience, * and figure out where stuff goes in the list. * ST1SORT ;Sort the tines LDX #00 LDA MATRIX+6 ;Tine 1: 1,-1,-1 SEC SBC MATRIX+7 SEC SBC MATRIX+8 EOR #$80 ;Add 128 STA T1Z ;z-coord STX T1LIST LDA MATRIX+6 ;Tine 2: 1,1,1 CLC ADC MATRIX+7 CLC ADC MATRIX+8 EOR #$80 STA T1Z+1 INX STX T1LIST+1 LDA MATRIX+7 ;Tine 3: -1,1,-1 SEC SBC MATRIX+6 SEC SBC MATRIX+8 EOR #$80 STA T1Z+2 INX STX T1LIST+2 LDA MATRIX+8 ;Tine 4: -1,-1,1 SEC SBC MATRIX+7 SEC SBC MATRIX+6 EOR #$80 STA T1Z+3 INX STX T1LIST+3 CMP T1Z+2 ;Now bubble-sort the list BCS :C1 ;largest values on top DEX ;So find the largest value! ...urolled code removed for sensitive viewers. And that's all it takes! And that, friends, covers the main details of Cool World. Can it possibly be true that this article is finally coming to an end? --------- Section 5: 3d library routines and memory map --------- There are seven routines: $8A00 CALCMAT - Calculate a rotation matrix $8A03 ACCROTX - Add a rotation to rotation matrix around x-axis $8A06 ACCROTY - ... y-axis $8A09 ACCROTZ - ... z-axis $8A0C GLOBROT - 16-bit rotation for centers $8A0F ROTPROJ - Rotate/Rotate and project objects $8A12 POLYFILL - Draw a pattern-filled polygon Following a discussion of the routines there is a memory map and discussion of the various locations. Library Routines ---------------- $8A00 CALCMAT Calculate a rotation matrix Stuff to set up: Nothing. On entry: .X .Y .A = theta_x theta_y theta_z On exit: Rotation matrix is contained in $B1-$B9 Cycle count: 390 cycles, worst case. $8A03 ACCROTX This will "accumulate" a rotation matrix by one rotation around the x-axis by the angle delta=2*pi/128. Because this is such a small angle the fractional parts must also be remembered, in $4A-$52. These routines are necessary to do full 3d rotations. On entry: carry clear/set to indicate positive/negative rotations. On exit: Updated matrix in $B1-$B9, $4A-$52 Cycle count: Somewhere around 150, I think. $8A06 ACCROTY Similar around y-axis $8A09 ACCROTZ Similar around z-axis $8A0C GLOBROT Perform a global rotation of 16-bit signed coordinates (Rotate centers) Stuff to set up: MULTLO1 MULTLO2 MULTHI1 MULTHI2 = $F7-$FE Multiplication tables C0XLO, C0XHI, C0YLO, C0YHI, C0ZLO, C0ZHI = $63-$6E Pointers to points to be rotated. Note also that $63-$6E has a habit of getting hosed by the other routines. CXLO, CXHI, CYLO, CYHI, CZLO, CZHI = $A3-$AE Pointers to where rotated points will be stored. On entry: .Y = number of points to rotate (0..Y-1). On exit: Rotated points are stored in CXLO CXHI etc. $8A0F ROTPROJ Rotate and project object points (vertices). Points must be in range -96..95, and must be with 128 units of the object center. Upon rotation, they are added to the object center, projected if C is set, and stored in the point list. The _length_ of a vertex vector (sqrt(x^2 + y^2 + z^2)) must be less than 128. Stuff to set up: Rotation matrix = $B1-$B9 (Call CALCMAT) ROTMATH = $B0 P0X P0Y P0Z = $69-$6E Pointers to points to be rotated. As with GLOBROT, these pointers will get clobbered by other routines. Points are 8-bit signed numbers in range -96..95, and must have length less than 128. PLISTXLO PLISTXHI ... = $BD-$C4 Where to store the rotated+projected points. CXLO CXHI ... = $A3-$AE List of object centers. Center will be added to rotated point before projection. XOFFSET, YOFFSET = $53, $54 Location of origin on the screen. Added to projected points before storing in PLIST. 160,100 gives center of screen. On entry: .X = Object center index (center is at CXLO,X CXHI,X etc.) .Y = Number of points to rotate (0..Y-1). .C = set for rotate and project, clear for just rotate On exit: Rotated, possibly projected points in PLIST. $8A12 POLYFILL Draw pattern-filled polygon into bitmap. Stuff to set up: MULTLO1, MULTLO2, MULTHI1, MULTHI2 = $F7-$FE BITMAP = $FF FILLPAT = $BB-$BC PLISTXLO, PLISTXHI, PLISTYLO, PLISTYHI = $BD-$C4 Point queue = $0200. Entries are _indices_ into the PLISTs, must be ordered _counter clockwise_, and must _close_ on itself (example: 4 1 2 4). On entry: .X = Number of points in point queue (LDX #3 in above example) On exit: Gorgeous looking polygon in bitmap at BITMAP. Memory map ---------- Zero page: $4A-$52 ACCREM Fractional part of accumulating matrix $53 XOFFSET Location of origin on screen (e.g. 160,100) $54 YOFFSET $55-$72 Intermediate variables. These locations are regularly hosed by the routines. Feel free to use them, but keep in mind that you will lose them! $A3-$AE C0XLO, C0XHI, C0YLO, ... Pointers to rotated object centers, i.e. where the objects are relative to you. Centers are 16-bit signed (2's complement). $AF-$B0 ROTMATH Pointer to the multiplication table at $C200-$C3FF. $B1-$B9 Rotation matrix. $BB-$BC FILLPAT Pointer to fill pattern (eight bytes, 8x8 character) $BD-$C0 PLISTXLO, PLISTXHI $C1-$C4 PLISTYLO, PLISTYHI Pointers to rotated object points, e.g. used by polygon renderer. Numbers are 16-bit 2's complement. $F7-$FE MULTLO1, MULTLO2, MULTHI1, MULTHI2 Pointers to math tables at $C400-$C9FF MULTLO1 -> $C500 MULTLO2 -> $C400 MULTHI1 -> $C800 MULTHI2 -> $C700 (Only the high bytes of the pointers are actually relevant). $FF BITMAP High byte of bitmap base address. Library: $0200-$027F Point queue. $8600-$9FFF Routines, tables, etc. $8A00 Jump table into routines Tables: $8400-$85FF ROTMATH, pointed to by $AF-$B0. $C000-$C2FF MULTLO, pointed to by $F7-$F8 and $F9-$FA $C300-$C5FF MULTHI, pointed to by $FB-$FC and $FD-$FE $C600-$C6FF CDEL, table of f(x)=x*cos(delta) $C700-$C7FF CDELREM remainder $C800-$C8FF SDEL x*sin(delta) $C900-$C9FF SDELREM $CA00-$CAFF PROJTAB, projection table, f(x)=d/x, integer part $CB00-$CBFF PROJREM, projection table, 256*remainder $CC00-$CCFF LOG, logarithm table $CD00-$CDFF EXP, exponential table $CE00-$CEFF NEGEXP, exponential table $CF00-$CFFF TRIG, table of 32*sin(2*pi*x/128) -- 160 bytes. ROTMATH is the Special 8-bit signed multiplication table for ROTPROJ. MULTLO and MULTHI are the usual fast-multiply tables. CDEL and SDEL are used by ACCROTX, to accumulate matrices. PROJTAB is used by ROTPROJ, to project the 16-bit coordinates. LOG EXP and NEGEXP are used by the POLYFILL divide routine, to calculate line slopes. Finally, TRIG is used by CALCMAT to calculate rotation matrices. The tables from $C600-$CFFF are fixed and must not be moved. The tables ROTMATH, MULTLO, and MULTHI on the other hand are only accessed indirectly, and thus may be relocated. Thus there is room for a color map and eight sprite definitions in any VIC bank. ---------- Section 6: Conclusions ---------- Wow. You really made it this far. I am totally impressed. What stamina. Well I hope you have been able to gain something from this article. It is my sincere hope that this will stimulate the development of some nice 3d programs. After a long, multi-year break from 3d graphics I really enjoyed figuring out all this stuff again, and finally doing the job right. I also enjoyed re-re-figuring it out, to write this article, since over half a year has passed since I wrote the 3d library! I hope you have enjoyed it too, and can use this knowledge (or improve on it!) to do some really nifty stuff. SLJ 4/3/98