The NEC PowerVR chip consists of a true-colour RAMDAC, and a hardware 3D
engine based on a Tile Accelerator. The Tile Accelerator divides the
scene into 32 by 32 pixel tiles, which can be rendered individually.
Each tile is then rendered into an internal 32 by 32 pixel frame
buffer in register memory before it is copied to the main frame buffer. As
rendering is done to the internal frame buffer, the fill rate is very high.
Also, no texel data is actually fetched from texture VRAM until the tile is
copied to the frame buffer, which means that the texture fill rate is not
affected by overpainting at all.
The follwing diagram shows the principle by which the hardware 3D
engine works:
There are two stages, which can be run in parallell (provided
you have dual sets of buffers of course). During the Binning stage, the
Tile Accelerator is fed graphic primitives (either using DMA or directly by
the CPU using the Store Queues or direct writes), which it will compile to an
internal format. While doing this, it will register in which tiles this
primitive might be visible by putting it in one or more tile bins.
(If it's not visible in any tile, it can be completely clipped of course.)
During the rendering stage, the ISP/TSP will read the lists created by
the Tile Accelerator, and for each tile render the primitives visible for
that tile into its internal framebuffer, before writing it out to the right
place in the VRAM framebuffer, where the RAMDAC can display it.
For a double buffering stratgy that allows you to run both stages
simultanously (but for different frames, i.e. binning frame N+1 while
rendering frame N), you need double sets of buffers for the display list
and the tile bins, as well as double frame buffers to avoid rendering
artifacts to be visible on the screen. The following diagram shows
which tile bin set and frame buffer to use to avoid conflict:
Frame # | Bin to TB # | Render from TB # | Render to FB # | Display FB # |
1 |
1 |
|
|
|
2 |
2 |
1 |
1 |
|
3 |
1 |
2 |
2 |
1 |
4 |
2 |
1 |
1 |
2 |
5 |
1 |
2 |
2 |
1 |
6 |
2 |
1 |
1 |
2 |
|
etc. As you can see, there is a two frame latency, i.e. frame 1 will
not be visible on screen until frame 3 is being generated.
There are 8 megabytes of video memory, located in memory area 1
(see the memory map). This memory is organized
as two banks of 32×1Mbit each, and depending on the value of address bit 24
they can either be accessed sequentially as 32 bit memory, or parallelly as
64 bit memory. In both cases, you get 8 megabytes of continuous address space,
but the correspondence of address to memory cell is slightly different, as
this figure shows:
32 bit interface | 64 bit interface |
0xA57FFFFC |
Bank 2 |
.
.
. |
0xA5400000 |
0xA53FFFFC |
Bank 1 |
.
.
. |
0xA5000000 |
|
0 ... 31 |
|
|
0xA47FFFF8 |
Bank 1 |
Bank 2 |
.
.
. |
0xA4000000 |
|
0 ... 31 |
32 ... 63 |
|
|
So, the bytes 0xA4000000-0xA4000003 correspond to
0xA5000000-0xA5000003, 0xA4000004-0xA4000007 to
0xA5400000-0xA5400003, 0xA4000008-0xA400000B to
0xA5000004-0xA5000007 and so on. Both interfaces can handle
16-bit writes and up, 8-bit writes are not possible. It is possible to
read any length of word, including 8-bit, though.
In the following register descriptions, an address specification using
the 32 bit interface will be referred to as a 32 bit address, and
an address specification using the 64 bit interface as a 64 bit address,
although this should not be mistaken as the width of the actual address
since both types of addresses are really 23 bits wide.
The addresses given here are to the P2 area, as the registers should
of course be accessed without cache.
The register descriptions are partly based on research done by bITmASTER
and maiwe.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Red |
Green |
Blue |
|
This register sets the solid colour displayed around the main display area.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
C |
|
COL |
SD |
DE |
|
- C - Clock double enable
- Setting this bit doubles the pixel clock, giving a scan rate suitable
for VGA monitors.
- COL - Colour mode select
- Selects the frame buffer pixel colour mode (all colour modes are little endian, e.g. in RGB888 the blue byte comes first)
Value | Colour mode | Bytes per pixel |
0 |
0 |
RGB555 |
2 |
0 |
1 |
RGB565 |
2 |
1 |
0 |
RGB888 |
3 |
1 |
1 |
RGB888 |
4 |
|
- SD - Scan Double enable
- Setting this bit makes each scan line be sent twice, allowing low
resolutions in VGA mode.
- DE - Display Enable
- This bit must be set for any graphics to be display. If it is set to zero,
only the border colour will be visible.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
This sets the address in the video RAM of the first pixel displayed (top left).
Address 0 means the first byte of the video RAM bank 1 (usually accessed as
A5000000 from the CPU). The address must be longword aligned.
This register is used for noninterlaced screens and the long field of
interlaced screens.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
Same as A05F8050, but used for the short field of interlaced screens.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Modulo |
Lines per field |
Pixel data per line |
|
This register determines how much pixel data to display each field, and the
modulo between each line of data.
- Modulo
- The number of 32-bit words to skip between each line, plus 1. I.e. a value of 1 means the lines are stored immediatelty after each other in memory.
- Lines per field
- How many lines of pixels to fetch and display each field, minus 1. Since this is per field and not per frame, it should be set to half the total vertical resolution (minus 1) in interlaced mode.
- Pixel data per line
- The number of 32-bit words of pixel data to fetch and display each line, minus 1. If you want X pixels per line, and each pixel is Y bytes, X*Y/4-1 is the correct value to write.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Top |
|
Bottom |
|
This register defines two rasterlines on the screen, which when they are passed by
the raster beam will generate a raster event (which optionally causes
an interrupt). The rasterline for the "Top" raster event is typically set just above the display
area, and the rasterline for the "Bottom" raster event just below the display area.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
VO |
BC |
|
I |
|
HP |
VP |
|
- VO - Video Output enable
- Set to 1 to enable video output.
- I - Interlace
- Set to 1 to enable interlaced video.
- BC - Broadcast standard
- Used to select type of colour sync for composite video
Value | Broadcast standard |
0 |
0 |
NTSC |
0 |
1 |
PAL |
1 |
0 |
PAL-M (?) |
1 |
1 |
PAL-N (?) |
|
- HP - H-sync polarity
- Set to 1 to for positive H-sync, 0 for negative H-sync.
- VP - V-sync polarity
- Set to 1 to for positive V-sync, 0 for negative V-sync.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Start |
|
Stop |
|
This register selects the horizontal range in which the border colour is
displayed. Left and right of this range, the border is displayed as black.
- Start
- The number of pixels from the horizontal sync where border display starts.
- Stop
- The number of pixels from the horizontal sync where border display ends.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Vertical |
|
Horizontal |
|
This register selects the total number of lines and "pixels" (including lace)
between each retrace. The horizontal and vertical refresh rate are determined
by this register, and the pixel clock. For 50Hz (PAL), set V=624 H=863. For
60Hz (NTSC/VGA), set V=524 H=857. (Halve the V value for non-interlaced
PAL/NTSC screens.)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Start |
|
Stop |
|
This register selects the vertical range in which the border colour is
displayed. Above and below this range, the border is displayed as black.
- Start
- The number of scanlines from the vertical sync where border display starts.
- Stop
- The number of scanlines from the vertical sync where border display ends.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
N |
|
LR |
|
|
Misc additional video settings.
- N
- Unknown. Set to 22.
- LR
- Low-res; setting this bit makes each pixel be output twice, effectively
giving a 320 pixel horizontal resolution.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Horizontal pos |
|
This register sets the distance from the horizontal sync to where pixel display starts.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Vertical pos 2 |
|
Vertical pos 1 |
|
This register sets the distance (in scanlines) from the vertical sync to where pixel display starts.
- Vertical pos 1
- This value is used for noninterlaced screens and the long fields of
interlaced screens
- Vertical pos 2
- This value is used for the short fields of interlaced screens
The addresses given here are to the P2 area, as the registers should
of course be accessed without cache.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
This sets the address of the Tile Bin array to which the Tile Accelerator
should perform its binning. 64 bytes of memory per tile will be used
at the video memory address pointed out by this register.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
This sets the address of the compiled Display dist buffer to which the Tile
Accelerator should output the processed primitives. The amount of memory
needed depends on how large the scene is.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Vertical |
|
Horizontal |
|
The size of the Tile Bin array in rows and columns.
- Vertical
- How many tiles high the Tile Bin array is, minus 1. Each tile is 32 pixels
high.
- Horizontal
- How many tiles wide the Tile Bin array is, minus 1. Each tile is 32 pixels
wide.
The addresses given here are to the P2 area, as the registers should
of course be accessed without cache.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
The address of the compiled Display list created by the Tile Accelerator which
contains the primitives for the scene.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
The address of a structure describing the location and clipping(?) of each
tile on the screen, as well as pointers to the respective Tile Bin buffers.
This structure has to be created before any rendering can be done, but can
be reused in subsequent renders using the same set of Tile Bin buffers.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
Modulo |
|
The modulo of the frame buffer to which rendering is to take place, in
bytes / 8.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
TH |
|
D |
COL |
|
The pixel format of the frame buffer to which rendering is to take place.
- TH - Alpha threshold
- Set this to control the alpha threshold level when output colour mode
is ARGB1555.
- D - Dither enable
- Setting this bit enables dithering in highcolour modes.
- COL - Colour mode select
- Selects the frame buffer pixel colour mode (all colour modes are little endian, e.g. in RGB888 the blue byte comes first)
Value | Colour mode | Bytes per pixel |
0 |
0 |
0 |
RGB555 |
2 |
0 |
0 |
1 |
RGB565 |
2 |
0 |
1 |
0 |
ARGB4444 |
2 |
0 |
1 |
1 |
ARGB1555 |
2 |
1 |
0 |
1 |
RGB888 |
4 |
1 |
1 |
0 |
ARGB888 |
4 |
|
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
32-bit Address |
|
|
The address of the frame buffer to which rendering is to take place.
The coordinates for the individual tiles will be added as an offset to
this base address.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|
COL |
|
The format of the entries in the palettes used by CLUT mode textures.
- COL - Colour mode select
-
Value | Colour mode | Bytes per entry |
0 |
0 |
ARGB1555 |
2 |
0 |
1 |
RGB565 |
2 |
1 |
0 |
ARGB4444 |
2 |
1 |
1 |
ARGB8888 |
4 |
|
Note that each palette entry always occupies 4 bytes of
address space, even if only two bytes are used.
Dreamcast Programming by Marcus Comstedt
Last modified: Wed Apr 25 12:39:02 MEST 2001