Author: MooingLemur, based on documentation written by JeffreyH
This is preliminary documentation and the specification can still change at any point.
Introduction
This is a reference for the VERA FX features. It is meant to be a complement to the tutorial, currently found here.
The FX Update mainly adds “helpers” inside of VERA that can be used by the CPU. There is no “magic button” that allows you to do 3D graphics for example. It mainly helps at certain CPU time-consuming tasks, most notably the ones that are present in the (deep) inner-loop of a game/graphics engine. The FX Update does therefore not fundamentally change the architecture or nature of VERA, it extends and improves it.
In other words: the CPU is still the orchestrator of all that is done, but it is alleviated from certain operations where it is not (very) good at or does not have direct access to.
FX Update extends addressing modes, it does not add or extend renderers.
Usage
DCSEL
VERA is mapped as 32 8-bit registers in the memory space of the Commander X16, starting at address $9F20 and ending at $9F3F. Many of these are (fully) used, but some bits remain unused. The DCSEL bits in register $9F25 (also called CTRL) has been extended to 6-bits to allow for the 4 registers $9F29-$9F2C to have additional meanings.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F25
CTRL
Reset
DCSEL
ADDRSEL
The FX features use DCSEL values 2, 3, 4, 5, and 6. This effectively gives FX 20 8-bit registers. Note that 15 of these registers are write-only, 2 of them are read-only and 3 are both readable and writable,
Important: unless DCSEL values of 2-6 are used, the behavior of VERA is exactly the same as it was before the FX update. This ensures that the FX update is backwards compatible with traditional non-FX uses of VERA.
Addr1 Mode
When DCSEL=2, the main FX configuration register becomes available (FX_CTRL/$9F29), which is both readable and writable. The 2 lower bits are the addr1 mode bits, which will change the behavior of how and when ADDR1 is updated. This puts the FX helpers in a certain “role”.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
Addr1 Mode
Description
0
Traditional VERA behavior
1
Line draw helper
2
Polygon filler helper
3
Affine helper
By default, Addr1 Mode is set to 0 (=00b), which is the normal and already-known behavior of ADDR1.
Line draw helper
When Addr1 Mode is set to 1 (=01b) the line draw helper is enabled.
Setting up the line draw helper
Set ADDR1 to the address of the starting pixel
Determine the octant (see below) you are going to draw in, which will inform your ADDR0 and ADDR1 increments.
Set ADDR1 increment in the direction you will always increment each step
For 8-bit mode: (+1, -1, -320, or +320)
For 4-bit mode: (-0.5, +0.5, -160, or +160)
Set ADDR0 increment in the direction you will sometimes increment. Even though this is the increment for ADDR0, we are using it in line draw mode as an incrementer for ADDR1.
For 8-bit mode: (+1, -1, -320, or +320).
For 4-bit mode: (-0.5, +0.5, -160, or +160)
For 4-bit mode, the half increments are set via the Nibble Increment bit and optionally the DECR bit in ADDRx_H. For the Nibble Increment bit to have effect, the main Address Increment must be set to 0, and the 4-bit Mode bit must be set in FX_CTRL ($9F29, DCSEL=2).
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F22
ADDRx_H (x=ADDRSEL)
Address Increment
DECR
Nibble Increment
Nibble Address
VRAM Address (16)
Octant
8-bit ADDR1 increment
8-bit ADDR0 increment
4-bit ADDR1 increment
4-bit ADDR0 increment
0
+1
-320
+0.5
-160
1
-320
+1
-160
+0.5
2
-320
-1
-160
-0.5
3
-1
-320
-0.5
-160
4
-1
+320
-0.5
+160
5
+320
-1
+160
-0.5
6
+320
+1
+160
+0.5
7
+1
+320
+0.5
+160
Set your slope into the two “X Increment” registers (DCSEL=3, see below). Note that increment registers are 15-bit signed fixed-point numbers, and for this mode, the range should be 0.0 to 1.0 inclusive, so you’ll either want to store the value of 1, or you’ll want to set only the fractional part.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_X_INCR_L (DCSEL=3) (Write only)
X Increment (-2:-9) (signed)
$9F2A
FX_X_INCR_H (DCSEL=3) (Write only)
X Incr. 32x
X Increment (5:1) (signed)
X Incr. (0)
X Incr. (-1)
Note: Of the two incrementers, the line draw helper uses only the X incrementer. However depending on the octant you are drawing in, this incrementer will be used to depict either x or y pixel increments. So the “X” should not be taken literally here, it just means the first of the two incrementers.
As a side effect of in line draw mode, by setting FX_X_INCR_H ($9F2A, DCSEL=3), the fractional part (the lower 9 bits) of X Position are automatically set to half a pixel. Furthermore, the lowest bit of the pixel position (which acts as an overflow bit) is set to 0 as well. This effectively sets the starting X-position to 0.5 (the center) of a pixel.
Note: There is no need to set the higher bits of the X position, since the FX X position (accumulator) is only used to track the fractional (subpixel) part of the line draw.
Polygon filler helper
When Addr1 Mode is set to 2 (=10b) the polygon filler helper is enabled.
Setting up the polygon filler helper
Assuming a 320 pixel-wide screen
Set ADDR0 to the address of the y-position of the top point of the triangle and x=0 (so on the left of the screen). Set its increment to +320 (for 8-bit mode) or +160 (for 4-bit mode).
Note: ADDR0 is used as “base address” for calculating ADDR1 for each horizontal line of the triangle. ADDR0 should therefore start at the top of the triangle and increment exactly one line each time.
There is no need to set ADDR1. This is done by VERA.
Calculate your slopes (dx/dy) for both the left and right point. Unlike the line draw helper, these slopes can be negative and can exceed 1.0. They are not dependent on octant, but cover the whole 180 degrees downwards. Below is an illustration of some (not-to-scale) examples of increments:
Set ADDR1 increment to +1 (for 8-bit mode) or +0.5 (for 4-bit mode)
ADDR1 increment can also be +4 if you use 32-bit cache writes, explained later)
Set your left slope into the two “X increment” registers and your right slope into the two “Y increment” registers (DCSEL=3, see below).
Important: They should be set to half the increment (or decrement) per horizontal line! This is because the polygon filler increments in two steps per line.
Note that increment registers are 15-bit signed fixed-point numbers:
6 bits for the integer pixel increment
9 bits for the fractional (subpixel) increment
1 additional bit that indicates the actual value should be multiplied by 32
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_X_INCR_L (DCSEL=3) (Write only)
X Increment (-2:-9) (signed)
$9F2A
FX_X_INCR_H (DCSEL=3) (Write only)
X Incr. 32x
X Increment (5:0) (signed)
X Incr. (-1)</ </tr>
$9F2B
FX_Y_INCR_L (DCSEL=3) (Write only)
Y/X2 Increment (-2:-9) (signed)
$9F2C
FX_Y_INCR_H (DCSEL=3) (Write only)
Y/X2 Incr. 32x
Y/X2 Increment (5:0) (signed)
Y/X2 Incr. (-1)</ </tr> </table> * Due to the fact that we are in "polygon fill"-mode, by setting the high bits of the "X increment" ($9F2A, DCSEL=3), the "X position" (the lower 9 bits of the position in DCSEL=4 and DCSEL=5) are automatically set to half a pixel. The same goes for the high bits of the Y/X2 increment ($9F2C, DCSEL=3) and Y/X2 position. * Set the "X position" and "Y/X2 position” to the x-pixel-position of the top triangle point.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_X_POS_L (DCSEL=4) (Write only)
X Position (7:0)
$9F2A
FX_X_POS_H (DCSEL=4) (Write only)
X Pos. (-9)
-
X Position (10:8)
$9F2B
FX_Y_POS_L (DCSEL=4) (Write only)
Y/X2 Position (7:0)
$9F2C
FX_Y_POS_H (DCSEL=4) (Write only)
Y/X2 Pos. (-9)
-
Y/X2 Position (10:8)
Steps that are needed for filling a triangle part with lines: * Read from `DATA1` * This will not return any useful data but will do two things in the background: * Increment/decrement the X1 and X2 positions by their corresponding increment values. * Set `ADDR1` to `ADDR0` + X1 * Then read the “Fill length (low)”-register. Its output depends on whether you're in 4 or 8-bit mode.
* If fill_len >= 16 (or >= 8 in 4-bit mode) then also read the “Fill length (high)”-register:
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2C
FX_POLY_FILL_H (DCSEL=5) (Read only)
Fill Len (9:3)
0
**Important**: when the two highest bits of Fill Len (bits 8 and 9) are both 1, it means there is a negative fill length. The line should not be drawn! * Together they give you 10-bits of fill length (ignore the other bits for now). Since `ADDR1` is already set properly you can immediately start drawing this number of pixels (given by Fill Len). * `sta DATA1` ; as many times as Fill Len states * Then read from `DATA0`: this will (also) increment X1 and X2 * Check if all lines of this triangle part have been drawn, if not go to the first step. There is also a 2-bit polygon mode, which is better explained in the [tutorial](https://docs.google.com/document/d/1q34uWOiM3Be2pnaHRVgSdHySI-qsiQWPTo_gfE54PTg) ## Affine helper When Addr1 Mode is set to 3 (=11b) the affine (transformation) helper is enabled. When reading from ADDR1 in this mode, the affine helper reads tile data from a special tile area defined by two new FX registers: * FX_TILEBASE is pointed to a set of 8x8 tiles in either 4-bit or 8-bit depth. FX can support up to 256 tile definitions, and can overlap the traditional layer tile bases. * FX_MAPBASE points to a square-shaped tile map, one byte per tile. This tile map has no attribute bytes. unlike the traditional layer 0/1 tile maps.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2A
FX_TILEBASE (DCSEL=2) (Write only)
FX Tile Base Address (16:11)
Affine Clip Enable
2-bit Polygon
$9F2B
FX_MAPBASE (DCSEL=2) (Write only)
FX Map Base Address (16:11)
Map Size
* **Affine Clip Enable** changes the behavior when the X/Y positions are outside of the tile map such that it always reads data from tile 0. The default behavior is to wrap the X/Y position to the opposite side of the map. * **Map Size** is a 2 bit value that affects both the width and height of the tile map. |Map Size|Dimensions| |-|-| |0|2×2| |1|8×8| |2|32×32| |3|128×128| * The **Transparent Writes** toggle in FX_CTRL is especially useful in Affine helper mode. Setting this toggle causes a write of zero to leave the byte (or the nibble) at the target address intact. This toggle is not limited to affine helper mode, and it affects writes to both DATA0 and DATA1.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
When using the affine helper, the X and Y position registers (DCSEL=4) are used to set ADDR1 to the source pixel indirectly in the aforementioned tile map, while the X and Y increments determine the step after each read of ADDR1.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_X_POS_L (DCSEL=4) (Write only)
X Position (7:0)
$9F2A
FX_X_POS_H (DCSEL=4) (Write only)
X Pos. (-9)
-
X Position (10:8)
$9F2B
FX_Y_POS_L (DCSEL=4) (Write only)
Y/X2 Position (7:0)
$9F2C
FX_Y_POS_H (DCSEL=4) (Write only)
Y/X2 Pos. (-9)
-
Y/X2 Position (10:8)
The affine helper supports the full range of X and Y increment values, including negative values.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_X_INCR_L (DCSEL=3) (Write only)
X Increment (-2:-9) (signed)
$9F2A
FX_X_INCR_H (DCSEL=3) (Write only)
X Incr. 32x
X Increment (5:0) (signed)
X Incr. (-1)</ </tr>
$9F2B
FX_Y_INCR_L (DCSEL=3) (Write only)
Y/X2 Increment (-2:-9) (signed)
$9F2C
FX_Y_INCR_H (DCSEL=3) (Write only)
Y/X2 Incr. 32x
Y/X2 Increment (5:0) (signed)
Y/X2 Incr. (-1)</ </tr> </table> ## 32-bit cache When the CPU reads a byte via DATA0 or DATA1, and "cache fill enable" is set, the value read will be copied into an indexed location inside the 32-bit cache.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
In 8-bit mode, a byte is cached, but in 4-bit mode, a nibble is cached instead. Afterwards, by default, the index into the cache is incremented, and loops back around to 0 after the last index. The index can be set explicitly via the FX_MULT register. 8-bit mode uses bits 3:2 and ranges from 0-3. 4-bit mode uses bits 3:1 and ranges from 0-7.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2C
FX_MULT (DCSEL=2) (Write only)
Reset Accum.
Accumulate
Subtract Enable
Multiplier Enable
Cache Byte Index
Cache Nibble Index
Two-byte Cache Incr. Mode
Alternatively, the cache index can cycle between two adjacent bytes: 0, 1, and back to 0; or 2, 3, and back to 2. This option only has effect in 8-bit mode.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2C
FX_MULT (DCSEL=2) (Write only)
Reset Accum.
Accumulate
Subtract Enable
Multiplier Enable
Cache Byte Index
Cache Nibble Index
Two-byte Cache Incr. Mode
### Setting the cache data directly Instead of filling the cache by reading from DATA0 or DATA1, the cache data can also be set directly by writing to the FX_CACHE* registers. Setting the cache directly does not affect the cache index.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CACHE_L (DCSEL=6) (Write only)
Cache (7:0) | Multiplicand (7:0) (signed)
$9F2A
FX_CACHE_M (DCSEL=6) (Write only)
Cache (15:8) | Multiplicand (15:8) (signed)
$9F2B
FX_CACHE_H (DCSEL=6) (Write only)
Cache (23:16) | Multiplier (7:0) (signed)
$9F2C
FX_CACHE_U (DCSEL=6) (Write only)
Cache (31:24) | Multiplier (15:8) (signed)
### Writing the cache to VRAM If "Cache write enabled" is set, the cache contents are written to VRAM when writing to DATA0 or DATA1. The primary use is to write all or part of the 32-bit cache to the 4-byte-aligned region of memory at the current address. Control over which parts are written are chosen by the value written to DATA0 or DATA1. The value written is treated as a **nibble mask** where a 0-bit writes the data and a 1-bit masks the data from being written.In other words, writing a 0 will flush the entire 32-bit cache. Writing `#%00001111` will write the second and third byte in the cache to VRAM in the second and third memory locations in the 4-byte-aligned region.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
### Transparency writes Transparent writes, when enabled, also applies to cache writes. If enabled, zero bytes (or zero nibbles in 4-bit mode) in the cache, which are treated as transparency pixels, are not written.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes </td>
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
</tr> </table> When "one-byte cache cycling" is turned on and DATA0 or DATA1 is written to, the byte at the current cache index is written to VRAM. When "Cache write enable" is set as well, the byte is duplicated 4 times when writing to VRAM. Usually the incrementing of the cache index is only triggered by reading from DATA0 or DATA1 when cache filling is enabled. However it can also be triggered by reading from DATA0 in polygon mode when cache filling is not enabled and "one-byte cache cycling" is enabled.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
## Multiplier and accumulator The 32-bit cache also doubles as an input to the hardware multiplier when Multiplier Enable is set.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2C
FX_MULT (DCSEL=2) (Write only)
Reset Accum.
Accumulate
Subtract Enable
Multiplier Enable
Cache Byte Index
Cache Nibble Index
Two-byte Cache Incr. Mode
To do a single multiplication, put the two 16-bit inputs into the two halves of the 32-bit cache. ```x86asm lda #(2 << 1) sta VERA_CTRL ; $9F25 stz VERA_FX_CTRL ; $9F29 (mainly to reset Addr1 Mode to 0) lda #%00010000 sta VERA_FX_MULT ; $9F2C lda #(6 << 1) sta VERA_CTRL ; $9F25 lda #<69 sta VERA_FX_CACHE_L ; $9F29 lda #>69 sta VERA_FX_CACHE_M ; $9F2A lda #<420 sta VERA_FX_CACHE_H ; $9F2B lda #>420 sta VERA_FX_CACHE_U ; $9F2C ``` The accumulator can be used to accumulate the sum of several multiplications. Before doing this single multiplication, ensure this is reset this to zero, otherwise the output will be added to the value of the accumulator before being written. There are two methods to do this. The first is to write a 1 into bit 7 of FX_MULT ($9F2C, DCSEL=2). The other, more conveniently, is to read FX_ACCUM_RESET (the same register location as VERA_FX_CACHE_L). ```x86asm lda FX_ACCUM_RESET ; $9F29 (DCSEL=6) ``` To perform the multiplication, it must be written to VRAM first. This is done via the cache write mechanism. Usually the cache itself is written to VRAM if "Cache Write Enable" is set. However, if the "Multiplier Enable" bit is also enabled, the multiplier result is written to VRAM instead. ```x86asm ; Set the ADDR0 pointer to $00000 and write our multiplication result there lda #(2 << 1) sta VERA_CTRL ; $9F25 lda #%01000000 ; Cache Write Enable sta VERA_FX_CTRL ; $9F29 stz VERA_ADDRx_L ; $9F20 (ADDR0) stz VERA_ADDRx_M ; $9F21 stz VERA_ADDRx_H ; $9F22 ; no increment stz VERA_DATA0 ; $9F23 ; multiply and write out result lda #%00010000 ; Increment 1 sta VERA_ADDRx_H ; $9F22 ; so we can read out the result lda VERA_DATA0 sta $0400 lda VERA_DATA0 sta $0401 lda VERA_DATA0 sta $0402 lda VERA_DATA0 sta $0403 ``` *Note*: the VERA works by pre-fetching the contents from VRAM whenever the address pointer is changed or incremented. This happens even when the address increment is 0. Due to this behavior, it is possible to have stale data latched in one of the two data ports if the underlying VRAM is changed via the other data port. This example avoids this scenario by only using ADDR0/DATA0. This potential gotcha was not introduced by the FX update, but rather has always been how VERA behaves. #### Accumulation One can also trigger the multiplication and add it to (or subtract it from) the multiplication accumulator by calling "accumulate" in one of two different ways. We could write a 1 into bit 6 of FX_MULT ($9F2C, DCSEL=2), but more conveniently, we can read FX_ACCUM (the same register location as VERA_FX_CACHE_M) ```x86asm lda FX_ACCUM ; $9F2A (DCSEL=6) ``` Once the accumulation is triggered, the result of the operation is stored back into the accumulator. The default accumulation operation is (multiply then) add. This can be switched to subtraction by setting the Subtract Enable bit in FX_MULT
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F2C
FX_MULT (DCSEL=2) (Write only)
Reset Accum.
Accumulate
Subtract Enable
Multiplier Enable
Cache Byte Index
Cache Nibble Index
Two-byte Cache Incr. Mode
If the multiplication accumulator has a nonzero value, any multiplications carried out via a VRAM Cache write will be offset by the value of the accumulator (either added to or subtracted from the accumulator), but they will not change the value of the accumulator. ### 16-bit hop There is a special address increment mode that can be used to read pairs of bytes via ADDR1.
Addr
Name
Bit 7
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Bit 0
$9F29
FX_CTRL (DCSEL=2)
Transp. Writes
Cache Write Enable
Cache Fill Enable
One-byte Cache Cycling
16-bit Hop
4-bit Mode
Addr1 Mode
In this mode, setting ADDR1's increment to +4 will result in alternating increments of +1 and +3. Setting it to +320 will result in alternating increments of +1 and +319. All other increment values, including negative increments, lack this special hop property. After this bit is set, writing to ADDRx_L resets the hop alignment such that the first increment is +1. This mode is useful for reading out a series of 16-bit values after a series of multiplications. For a more detailed explanation of chained math operations, see the [tutorial](https://docs.google.com/document/d/1q34uWOiM3Be2pnaHRVgSdHySI-qsiQWPTo_gfE54PTg).