DMA and HDMA

From Snesdev wiki
Jump to navigation Jump to search

DMA AND HDMA

DMA, or "direct memory access" is found in a number of computer systems, not just the Super Nintendo. It's basically a way for a peripheral or coprocessor to read data directly from memory, instead of requiring the main CPU to do a number of reads and writes. This is typically faster, if only because it lets the system skip the opcode fetch-and-decode. In the SNES, the CPU is paused during DMA since the address busses are in use for the transfer.

HDMA is similar in concept, though rather different in execution: instead of transferring a block of memory all at once, it transfers a few bytes during the H-Blank period of each scanline. This is extremely helpful, as most PPU registers may only be changed during a frame (at least without glitching) during this narrow window.

The SNES has 8 channels (numbered 0-7) that can be used for either DMA or HDMA. HDMA takes priority over DMA if both are to occur at once, pausing all DMA and terminating a conflicting DMA immediately. Lower-numbered channels take priority over higher-numbered channels.


DMA

A DMA transfer has three main variables, and a number of setting bits. These are: (must all be set up before starting DMA)

  • Direction (bit 7 of $43x0): Read from PPU or write to PPU?
  • Fixed (bit 3 of $43x0): Adjust Address?
  • Increment (bit 4 of $43x0): Direction to adjust Address?
  • Mode (bits 0-2 of $43x0): See below...
  • Port (register $43x1): If this is 'xx', the register accessed will be $21xx.
  • Address (registers $43x2-4): Any CPU address, just like you'd use with the Absolute Long addressing mode.
  • Count (registers $43x5-6): The number of bytes to transfer.

See register $43x0 for the correspondance between the Mode bits and the transfer mode. Note that One Register Write Once and One Register Write Twice end up being the exact same thing, and Two Registers Write Once and Two Registers Write Twice Alternale are the same, but that Two Registers Write Once and Two Registers Write Twice Each are different.

DMA transfers take 8 master cycles per byte transferred, no matter the FastROM setting. There is also an overhead of 8 master cycles per channel, and an overhead of 12-24 cycles for the whole transfer.

The basic process seems to be:

1. Get byte and write it to the destination.
   - The DMA seems to take advantage of the SNES's two address busses with one
     shared data bus. AAddress is pushed out Bus A, Port is pushed out bus B,
     and the read/write signals are sent according to Direction. The bus
     marked read obligingly put data on the bus, while the bus marked write
     obligingly writes that value.
   - Thus, since the PPU/APU/WRAM registers are only accessible via Bus B,
     attempts to access them via AAddress will result in Open Bus accesses.
   - Attempts to access WRAM via both Bus A and Bus B (registers 2180-3) will
     fail, with the 2180-3 access being Open Bussed.
   - Also, DMA cannot access the $4300-$437f registers nor $420b nor $420c.
     Writes will have no effect, and reads will return Open Bus.
2. Adjust AAddress.
   - If Fixed is set, do nothing. Else if Increment is set, subtract one,
     else add one.
   - Note that the bank byte is not modified.
3. Decrement Count. If count is not zero, then go to step 1.
   - Thus, if Count is initially zero, it wraps to 65535 before being
     tested. So you end up transferring 65536 bytes.

Note that Count ($43x5-6) ends up always 0, unless a conflicting HDMA terminates the transfer early.


HDMA

HDMA has 4 flags and 5 variables. Again, those marked '*' are required before starting HDMA. In addition, those marked '+' are required if HDMA is to be started mid-frame.

  • Addressing Mode (bit 6 of $43x0): If clear, Direct, else Indirect. (*)
  • Transfer Mode (bits 0-2 of $43x0): See below... (*)
  • Port ($43x1): As for DMA. (*)
  • AAddress ($43x2-4): Pointer to the HDMA Table. Not really 'required' for starting mid-frame, but unless you're going to stop it before the next init... (*)
  • Indirect Address ($43x5-6): Used with Indirect Bank. See below...
  • Indirect Bank ($43x7): Used with Indirect Address. See below... (*)
  • Address ($43x8-9): See below... (+)
  • Repeat (bit 7 of $43xA): Whether to write every scanline or not (+)
  • Line Counter (bits 0-6 of $43xA): See below... (+)
  • DoTransfer: Used internally.

Modes are the same as for DMA. However, note that only one cycle through the mode is done per scanline, so One Register Write Once will write 1 byte per scanline, while One Register Write Twice will write two.

For each scanline during which HDMA is active (i.e. at least one channel is not paused and has not terminated yet for the frame), there are ~18 master cycles overhead. Each active channel incurs another 8 master cycles overhead (during which time $42xA is presumably loaded if necessary) for every scanline, whether or not a transfer actually occurs. If a new indirect address is required, 16 master cycles are taken to load it. Then 8 cycles per byte transferred are used. Thus, HDMA takes a maximum of 466 master cycles per scanline (if all 8 channels are active, require an indirect address load, and transfer 4 bytes).

The basic process has two sections. First, at the beginning of the frame (V=0 H=approx 6), for all active HDMA channels (see register $420c):

1. Copy AAddress into Address.
2. Load $43xA (Line Counter and Repeat) from the table. I believe $00 will
   terminate this channel immediately.
3. Load Indirect Address, if necessary.
4. Set DoTransfer to true.

Edit: (Needs to be validated) It looks like DoTransfer must be set to false for inactive HDMA transfers, otherwise issues will occur when enabling a HDMA channel mid-screen.

The CPU is paused during this time. Overhead is ~18 master cycles, plus 8 master cycles for each channel set for direct HDMA and 24 master cycles for each channel set for indirect HDMA.

If you are starting HDMA mid-frame, you must basically do the init process manually by setting $43x8-A, and $43x5-6 for indirect channels. Note though that there is no way to perform step 4, so no transfer will be done the first transfer period. Also, note that a channel that has already terminated for the frame cannot be restarted. XXX: Or does it automatically do Step 4 when you enable the channel?

Then, for each scanline from V=0 to V=$e0 (or V=$ef is overscan is enabled) at about H=$116:

1. If DoTransfer is false, skip to step 3.
2. For the number of bytes (1, 2, or 4) required for this Transfer Mode...
   a. Read a byte from Address or Indirect Address, and increment.
   b. Write the byte to Port, Port+1, Port+2, or Port+3, depending on the
      Transfer Mode and which byte we're on.
   - The same notes regarding DMA from PPU to PPU or RAM to RAM via $2180
     apply here as well.
3. Decrement $43xA.
4. Set DoTransfer to the value of Repeat.
5. If Line Counter is zero...
   a. Read the next byte from Address into $43xA (thus, into both Line
      Counter and Repeat).
   b. If Addressing Mode is Indirect, read two bytes from Address into
      Indirect Address (and increment Address by two bytes).
      - One oddity: if $43xA is 0 and this is the last active HDMA channel for
        this scanline, only load one byte for Address, and use the
        $00 for the low byte. So Address ends up incremented one less than
        otherwise expected, and one less CPU Cycle is used.
   c. If $43xA is zero, terminate this HDMA channel for this frame. The bit in
      $420c is not cleared, though, so it may be automatically restarted next
      frame.
   d. Set DoTransfer to true.
6. Continue with Step 1 next scanline.

HDMA does not occur during V-Blank, as any writes it might perform are likely have no visible effect anyway. The start-of-frame processing then resets all active channels at the end of V-Blank. This allows updating of the HDMA registers during V-Blank without worrying about the transfer beginning immediately and scribbling on the PPU state.

Note how the above implicitly defines the format of the HDMA table. Explicitly, the format is a series of entries. Each entry begins with a line count and repeat flag. If repeat is false, there is one scanline worth of data following and the count is the number of scanlines to wait before processing the next entry. If it's true, the line count is the number of scanlines worth of data following. The data following is either a pointer to the data (for Indirect HDMA), or the data itself (for Direct HDMA).

Looking at the above, it's clear why Address, and Repeat/Line Counter must be initialized by hand when starting HDMA mid-frame: they're only automatically initialized at the start of the frame. Note how AAddress is not affected by HDMA, though Address and Repeat/Line Counter are.

Registers

MDMAEN - DMA Enable ($420B)

$420B
Byte
☐ Read
☑ Write
Access during:

☑ Forced blank

☑ Vertical blank

☑ Horizontal blank

☑ Rendering

7  bit  1
---- ----
7654 3210 -- Enable the selected DMA channels.
             The CPU will be paused until all DMAs complete. DMAs will be executed in order from 0 to 7 (?).

See registers $43x0-$43xA for more details. If HDMA (init or transfer) occurs while a DMA is in progress, the DMA will be paused for the duration. If the HDMA happens to involve the current DMA channel, the DMA will be immediately terminated and the HDMA will progress using the then-current values of the registers. Other DMA channels will be unaffected. This register is initialized to $00 on power on or reset. See the section “DMA AND HDMA” below for more information.

HDMAEN - HDMA Enable ($420C)

$420C
Byte
☐ Read
☑ Write
Access during:

☑ Forced blank

☑ Vertical blank

☑ Horizontal blank

☑ Rendering

7  bit  1
---- ----
7654 3210 -- Enable the selected HDMA channels.
             HDMAs will be executed in order from 0 to 7 (?).

See registers $43x0-$43xA for more details. If HDMA (init or transfer) occurs while a DMA is in progress, the DMA will be paused for the duration. If the HDMA happens to involve the current DMA channel, the DMA will be immediately terminated and the HDMA will progress using the then-current values of the registers. Other DMA channels will be unaffected. Note that enabling a channel mid-frame will begin HDMA at the next HDMA point. However, the HDMA register initialization only occurs before the HDMA point on scanline 0, so those registers will have to be initialized by hand before enabling HDMA. A channel that has already terminated for the frame cannot be restarted in this manner. Writing 0 to a bit will pause an ongoing HDMA; the transfer may be continued by writing 1 to the bit. This register is initialized to $00 on power on or reset. See the section “DMA AND HDMA” below for more information.

DMAPx - DMA Control for Channel x (x=0-7) ($43X0)

$43X0
Byte
☑ Read
☑ Write
Access during:

☑ Forced blank

☑ Vertical blank

☑ Horizontal blank

☑ Rendering

7  bit  1
---- ----
da-i fttt
|| | ||||
|| | |+++-- Transfer Mode
|| | |      000 = 1 register write once             (1 byte:  p               )
|| | |      001 = 2 registers write once            (2 bytes: p, p+1          )
|| | |      010 = 1 register write twice            (2 bytes: p, p            )
|| | |      011 = 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)
|| | |      100 = 4 registers write once            (4 bytes: p, p+1, p+2, p+3)
|| | |      101 = 2 registers write twice alternate (4 bytes: p, p+1, p,   p+1)
|| | |      110 = 1 register write twice            (2 bytes: p, p            )
|| | |      111 = 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)
|| | |
|| | +----- DMA Fixed Transfer
|| |        When set, the DMA address will not be adjusted. When clear, the address
|| |        will be adjusted as specified by bit 4. This bit does not affect HDMA.
|| |
|| +------- DMA Address Increment
||          When clear, the DMA address will be incremented for each byte. When set,
||          the DMA address will be decremented. This bit does not affect HDMA.
||
|+--------- HDMA Addressing Mode
|           When clear, the HDMA table contains the data to transfer. When set, the
|           HDMA table contains pointers to the data. This bit does not affect DMA.
|
+---------- Transfer Direction
            When clear, data will be read from the CPU memory and written to the PPU
            register. When set, vice versa. Contrary to previous belief, this bit DOES
            affect HDMA! Indirect mode is more useful, it will read the table as normal
            and write from Bus B to the Bus A address specified. Direct mode will work 
            as expected though, it will read counts from the table and try to write the
            data values into the table.

The effect of writing this register during HDMA to the associated channel is unknown. Most likely, the change takes effect for the next HDMA transfer. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.

BBADx - DMA Destination Register for Channel x (x=0-7)

43x1 rwb++++
        pppppppp

This specifies the Bus B address to access. Considering the standard CPU memory space, this specifies which address $00:2100-$00:21ff to access, with two- and four-register modes wrapping $21ff->$2100, not $2200. The effect of writing this register during HDMA to the associated channel is unknown. Most likely, the change takes effect for the next transfer. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.

A1TxL - DMA Source Address for Channel x low byte (x=0-7)

A1TxH - DMA Source Address for Channel x high byte (x=0-7)

A1Bx - DMA Source Address for Channel x bank byte (x=0-7)

43x2 rwl++++
43x3 rwh++++
43x4 rwb++++
        bbbbbbbb hhhhhhhh llllllll

This specifies the starting Address Bus A address for the DMA transfer, or the beginning of the HDMA table for HDMA transfers. Note that Bus A does not access the Bus B registers, so pointing this address at say $00:2100 results in open bus. The effect of writing this register during HDMA to the associated channel is unknown. However, current theory is that only $43x4 will affect the transfer. The changes will take effect at the next HDMA init. During DMA, $43x2/3 will be incremented or decremented as specified by $43x0. However $43x4 will NOT be adjusted. These registers will not be affected by HDMA. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.

DASxL - DMA Size/HDMA Indirect Address low byte (x=0-7)

DASxH - DMA Size/HDMA Indirect Address high byte (x=0-7)

DASBx - HDMA Indirect Address bank byte (x=0-7)

43x5 rwl++++
43x6 rwh++++
43x7 rwb++++
        bbbbbbbb hhhhhhhh llllllll

For DMA, $43x5/6 indicate the number of bytes to transfer. Note that this is a strict limit: if this is set to 1 then only 1 byte will be written, even if the transfer mode specifies 2 or 4 registers (and if this is 5, all 4 registers would be written once, then the first only would be written a second time). Note, however, that writing $0000 to this register actually results in a transfer of $10000 bytes, not 0. $43x5/6 are decremented during DMA, and thus typically end up set to 0 when DMA is complete. For HDMA, $43x7 specifies the bank for indirect addressing mode. The indirect address is copied into $43x5/6 and incremented appropriately. For direct HDMA, these registers are not used or altered. Writes to $43x7 during indirect HDMA will take effect for the next transfer. Writes to $43x5/6 during indirect HDMA will also take effect for the next HDMA transfer, however this is only noticeable during repeat mode (for normal mode, a new indirect address will be read from the table before the transfer). For a direct transfer, presumably nothing will happen. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.

A2AxL - HDMA Table Address low byte (x=0-7)

A2AxH - HDMA Table Address high byte (x=0-7)

43x8 rwl++++
43x9 rwh++++
        aaaaaaaa aaaaaaaa

At the beginning of the frame $43x2/3 are copied into this register for all active HDMA channels, and then this register is updated as the table is read. Thus, if a game wishes to start HDMA mid-frame (or change tables mid-frame), this register must be written. Writing this register mid-frame changes the table address for the next scanline. This register is not used for DMA. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.

NLTRx - HDMA Line Counter (x=0-7)

43xa rwb++++
        rccccccc
        r        = Repeat Select.^
         ccccccc = Line count.^^

^When set, the HDMA transfer will be performed every line, rather than only when this register is loaded from the table. However, this byte (and the indirect HDMA address) will only be reloaded from the table when the counter reaches 0.

^^This is decremented every scanline. When it reaches 0, a byte is read from the HDMA table into this register (and the indirect HDMA address is read into $43x5/6 if applicable).

One oddity: the register is decremeted before being checked for r status or c==0. Thus, setting a value of $80 is really “128 lines with no repeat” rather than “0 lines with repeat”. Similarly, a value of $00 will be “128 lines with repeat” when it doesn’t mean “terminate the channel”. This register is initialized at the end of V-Blank for every active HDMA channel. Note that if a game wishes to begin HDMA during the frame, it will most likely have to initalize this register. Writing this mid-transfer will similarly change the count and repeat to take effect next scanline. Remember though that ‘repeat’ won’t take effect until after the next transfer period. This register is set to $ff on power on, and is unchanged on reset. See the section “DMA AND HDMA” below for more information.