PowerVR

The NEC PowerVR chip consists of a true-colour RAMDAC, and a hardware 3D engine based on a Tile Accelerator. The Tile Accelerator divides the scene into 32 by 32 pixel tiles, which can be rendered individually. Each tile is then rendered into an internal 32 by 32 pixel frame buffer in register memory before it is copied to the main frame buffer. As rendering is done to the internal frame buffer, the fill rate is very high. Also, no texel data is actually fetched from texture VRAM until the tile is copied to the frame buffer, which means that the texture fill rate is not affected by overpainting at all.


3D engine principle overview

The follwing diagram shows the principle by which the hardware 3D engine works:
There are two stages, which can be run in parallell (provided you have dual sets of buffers of course). During the Binning stage, the Tile Accelerator is fed graphic primitives (either using DMA or directly by the CPU using the Store Queues or direct writes), which it will compile to an internal format. While doing this, it will register in which tiles this primitive might be visible by putting it in one or more tile bins. (If it's not visible in any tile, it can be completely clipped of course.) During the rendering stage, the ISP/TSP will read the lists created by the Tile Accelerator, and for each tile render the primitives visible for that tile into its internal framebuffer, before writing it out to the right place in the VRAM framebuffer, where the RAMDAC can display it.

For a double buffering stratgy that allows you to run both stages simultanously (but for different frames, i.e. binning frame N+1 while rendering frame N), you need double sets of buffers for the display list and the tile bins, as well as double frame buffers to avoid rendering artifacts to be visible on the screen. The following diagram shows which tile bin set and frame buffer to use to avoid conflict:
Frame #Bin to TB #Render from TB #Render to FB #Display FB #
1 1
2 2 1 1
3 1 2 2 1
4 2 1 1 2
5 1 2 2 1
6 2 1 1 2
etc. As you can see, there is a two frame latency, i.e. frame 1 will not be visible on screen until frame 3 is being generated.

Video memory

There are 8 megabytes of video memory, located in memory area 1 (see the memory map). This memory is organized as two banks of 32×1Mbit each, and depending on the value of address bit 24 they can either be accessed sequentially as 32 bit memory, or parallelly as 64 bit memory. In both cases, you get 8 megabytes of continuous address space, but the correspondence of address to memory cell is slightly different, as this figure shows:

32 bit interface64 bit interface
0xA57FFFFC     Bank 2    
.
.
.
0xA5400000
0xA53FFFFC     Bank 1    
.
.
.
0xA5000000
0 ... 31
0xA47FFFF8     Bank 1         Bank 2    
.
.
.
0xA4000000
0 ... 31 32 ... 63

So, the bytes 0xA4000000-0xA4000003 correspond to 0xA5000000-0xA5000003, 0xA4000004-0xA4000007 to 0xA5400000-0xA5400003, 0xA4000008-0xA400000B to 0xA5000004-0xA5000007 and so on. Both interfaces can handle 16-bit writes and up, 8-bit writes are not possible. It is possible to read any length of word, including 8-bit, though.

In the following register descriptions, an address specification using the 32 bit interface will be referred to as a 32 bit address, and an address specification using the 64 bit interface as a 64 bit address, although this should not be mistaken as the width of the actual address since both types of addresses are really 23 bits wide.

RAMDAC Registers

The addresses given here are to the P2 area, as the registers should of course be accessed without cache. The register descriptions are partly based on research done by bITmASTER and maiwe.

A05F8040 - Border colour RGB

313029282726252423222120191817161514131211109876543210
Red Green Blue
This register sets the solid colour displayed around the main display area.

A05F8044 - Display mode

313029282726252423222120191817161514131211109876543210
C COL SD DE
C - Clock double enable
Setting this bit doubles the pixel clock, giving a scan rate suitable for VGA monitors.
COL - Colour mode select
Selects the frame buffer pixel colour mode (all colour modes are little endian, e.g. in RGB888 the blue byte comes first)
ValueColour modeBytes per pixel
0 0 RGB555 2
0 1 RGB565 2
1 0 RGB888 3
1 1 RGB888 4
SD - Scan Double enable
Setting this bit makes each scan line be sent twice, allowing low resolutions in VGA mode.
DE - Display Enable
This bit must be set for any graphics to be display. If it is set to zero, only the border colour will be visible.

A05F8050 - Video memory base offset 1

313029282726252423222120191817161514131211109876543210
32-bit Address
This sets the address in the video RAM of the first pixel displayed (top left). Address 0 means the first byte of the video RAM bank 1 (usually accessed as A5000000 from the CPU). The address must be longword aligned. This register is used for noninterlaced screens and the long field of interlaced screens.

A05F8054 - Video memory base offset 2

313029282726252423222120191817161514131211109876543210
32-bit Address
Same as A05F8050, but used for the short field of interlaced screens.

A05F805C - Display size and modulo

313029282726252423222120191817161514131211109876543210
Modulo Lines per field Pixel data per line
This register determines how much pixel data to display each field, and the modulo between each line of data.
Modulo
The number of 32-bit words to skip between each line, plus 1. I.e. a value of 1 means the lines are stored immediatelty after each other in memory.
Lines per field
How many lines of pixels to fetch and display each field, minus 1. Since this is per field and not per frame, it should be set to half the total vertical resolution (minus 1) in interlaced mode.
Pixel data per line
The number of 32-bit words of pixel data to fetch and display each line, minus 1. If you want X pixels per line, and each pixel is Y bytes, X*Y/4-1 is the correct value to write.

A05F80CC - Raster event position

313029282726252423222120191817161514131211109876543210
Top Bottom
This register defines two rasterlines on the screen, which when they are passed by the raster beam will generate a raster event (which optionally causes an interrupt). The rasterline for the "Top" raster event is typically set just above the display area, and the rasterline for the "Bottom" raster event just below the display area.

A05F80D0 - Video encapsulation

313029282726252423222120191817161514131211109876543210
VO BC I HP VP
VO - Video Output enable
Set to 1 to enable video output.
I - Interlace
Set to 1 to enable interlaced video.
BC - Broadcast standard
Used to select type of colour sync for composite video
ValueBroadcast standard
0 0 NTSC
0 1 PAL
1 0 PAL-M (?)
1 1 PAL-N (?)
HP - H-sync polarity
Set to 1 to for positive H-sync, 0 for negative H-sync.
VP - V-sync polarity
Set to 1 to for positive V-sync, 0 for negative V-sync.

A05F80D4 - Border horizontal range

313029282726252423222120191817161514131211109876543210
Start Stop
This register selects the horizontal range in which the border colour is displayed. Left and right of this range, the border is displayed as black.
Start
The number of pixels from the horizontal sync where border display starts.
Stop
The number of pixels from the horizontal sync where border display ends.

A05F80D8 - Full video size

313029282726252423222120191817161514131211109876543210
Vertical Horizontal
This register selects the total number of lines and "pixels" (including lace) between each retrace. The horizontal and vertical refresh rate are determined by this register, and the pixel clock. For 50Hz (PAL), set V=624 H=863. For 60Hz (NTSC/VGA), set V=524 H=857. (Halve the V value for non-interlaced PAL/NTSC screens.)

A05F80DC - Border vertical range

313029282726252423222120191817161514131211109876543210
Start Stop
This register selects the vertical range in which the border colour is displayed. Above and below this range, the border is displayed as black.
Start
The number of scanlines from the vertical sync where border display starts.
Stop
The number of scanlines from the vertical sync where border display ends.

A05F80E8 - Additional video settings

313029282726252423222120191817161514131211109876543210
N LR
Misc additional video settings.
N
Unknown. Set to 22.
LR
Low-res; setting this bit makes each pixel be output twice, effectively giving a 320 pixel horizontal resolution.

A05F80EC - Display horizontal position

313029282726252423222120191817161514131211109876543210
Horizontal pos
This register sets the distance from the horizontal sync to where pixel display starts.

A05F80F0 - Display vertical position

313029282726252423222120191817161514131211109876543210
Vertical pos 2 Vertical pos 1
This register sets the distance (in scanlines) from the vertical sync to where pixel display starts.
Vertical pos 1
This value is used for noninterlaced screens and the long fields of interlaced screens
Vertical pos 2
This value is used for the short fields of interlaced screens

Tile Accelerator Registers

The addresses given here are to the P2 area, as the registers should of course be accessed without cache.

A05F8124 - Tile Bin base output address

313029282726252423222120191817161514131211109876543210
32-bit Address
This sets the address of the Tile Bin array to which the Tile Accelerator should perform its binning. 64 bytes of memory per tile will be used at the video memory address pointed out by this register.

A05F8128 - Display list base output address

313029282726252423222120191817161514131211109876543210
32-bit Address
This sets the address of the compiled Display dist buffer to which the Tile Accelerator should output the processed primitives. The amount of memory needed depends on how large the scene is.

A05F813C - Tile Bin array size

313029282726252423222120191817161514131211109876543210
Vertical Horizontal
The size of the Tile Bin array in rows and columns.
Vertical
How many tiles high the Tile Bin array is, minus 1. Each tile is 32 pixels high.
Horizontal
How many tiles wide the Tile Bin array is, minus 1. Each tile is 32 pixels wide.

ISP/TSP Registers

The addresses given here are to the P2 area, as the registers should of course be accessed without cache.

A05F8020 - Display list base input address

313029282726252423222120191817161514131211109876543210
32-bit Address
The address of the compiled Display list created by the Tile Accelerator which contains the primitives for the scene.

A05F802C - Tile Bin header input address

313029282726252423222120191817161514131211109876543210
32-bit Address
The address of a structure describing the location and clipping(?) of each tile on the screen, as well as pointers to the respective Tile Bin buffers. This structure has to be created before any rendering can be done, but can be reused in subsequent renders using the same set of Tile Bin buffers.

A05F804C - Render output modulo

313029282726252423222120191817161514131211109876543210
Modulo
The modulo of the frame buffer to which rendering is to take place, in bytes / 8.

A05F8048 - Render output pixel format

313029282726252423222120191817161514131211109876543210
TH D COL
The pixel format of the frame buffer to which rendering is to take place.
TH - Alpha threshold
Set this to control the alpha threshold level when output colour mode is ARGB1555.
D - Dither enable
Setting this bit enables dithering in highcolour modes.
COL - Colour mode select
Selects the frame buffer pixel colour mode (all colour modes are little endian, e.g. in RGB888 the blue byte comes first)
ValueColour modeBytes per pixel
0 0 0 RGB555 2
0 0 1 RGB565 2
0 1 0 ARGB4444 2
0 1 1 ARGB1555 2
1 0 1 RGB888 4
1 1 0 ARGB888 4

A05F8060 - Render output address

313029282726252423222120191817161514131211109876543210
32-bit Address
The address of the frame buffer to which rendering is to take place. The coordinates for the individual tiles will be added as an offset to this base address.

A05F8108 - Texture palette colour mode

313029282726252423222120191817161514131211109876543210
COL
The format of the entries in the palettes used by CLUT mode textures.
COL - Colour mode select
ValueColour modeBytes per entry
0 0 ARGB1555 2
0 1 RGB565 2
1 0 ARGB4444 2
1 1 ARGB8888 4

Note that each palette entry always occupies 4 bytes of address space, even if only two bytes are used.


Powered by RoxenDreamcast Programming by Marcus Comstedt
Last modified: Wed Apr 25 12:39:02 MEST 2001