NESDev and Strangulation Records messageboards
Forum Index | FAQ | New User | Login | Search

Previous ThreadView All ThreadsNext ThreadShow in Flat Mode*


SubjectRe: Timing Question  
Posted byteaguecl
Posted on10/27/04 7:34 PM
From IP201.134.153.34  



Yea, you're right my first message wasn't very clear. Here is how it works. The PPU does a little bit of work on each PPU clock cycle. The CPU does a little bit of work on each CPU clock cycle. The PPU clock is 3x faster than the CPU clock (NTSC) and so the PPU can do 3 "things" for every 1 that the CPU does. On top of this a single CPU instruction takes anywhere from 1 to 7 (or more?) CPU clocks to complete. The number of "things" the CPU has to do for a particular instruction directly affect how many clock cycles it will take.
For example take opcode 0xE6, which is the INC instruction with Zero Page addressing and takes 5 CPU clock cycles to complete.
1 clock cycle for fetching the opcode from memory (reading $E6 from address at PC)
1 clock cycle for fetching the operand (reading zero page address from PC+1)
1 clock cycle for reading from the zero page address
1 clock cycle for the ALU (arithmetic logic unit) to increment the value
1 clock cycle to write the newly incremented value back to the zero page address

So you can see, opcode $E6 takes 5 clocks to complete because there are 5 "things" the CPU needs to do in order to complete it and each one takes one clock cycle to do.
In hardware, the PPU and CPU operate in parallel. This means that while the CPU is spending 5 CPU CC's executing a INC instruction the PPU is doing 5*3=15 "things".
So here is a timeline of when the "things" get done in real hardware, using the INC as an example. I'm assuming that what the PPU is doing at this point in time is rendering pixels, see brad taylor's doc for more info.

PPU CC - CPU CC
1 (pixel #1) - 1 (Read from PC)
2 (pixel #2)
3 (pixel #3)
4 (pixel #4) - 2 (Read from PC+1)
5 (pixel #5)
6 (pixel #6)
7 (pixel #7) - 3 (Read from zero page addr)
8 (pixel #8)
9 (pixel #9)
10 (pixel #10) - 4 (ALU increments value)
11 (pixel #11)
12 (pixel #12)
13 (pixel #13) - 5 (value written back to zero page addr)
14 (pixel #14)
15 (pixel #15)

A CPU instruction level accurate emulator will handle things in bigger chunks, and might look like this for the INC:
opcode = internal_ram[PC]; (opcode = $E6)
(jump to $E6 handler)
PC += 2;
addr = internal_ram[PC+1]
internal_ram[addr] = internal_ram[addr] + 1;
cpu_clocks += 5;
ppu_update();

It will execute the entire INC instruction, and then update the PPU. A timeline of this might look like the following:
PPU CC - CPU CC
1 (no work) - 1 (no work)
2 (no work)
3 (no work)
4 (no work) - 2 (no work)
5 (no work)
6 (no work)
7 (no work) - 3 (no work)
8 (no work)
9 (no work)
10 (no work) - 4 (no work)
11 (no work)
12 (no work)
13 (no work) - 5 (Read PC, Read PC+1, Read zero page addr, ALU inc, write back)
14 (no work)
15 (render pixels 1 through 15)

As you can see the same work is being done, but not quite in the same order. In some situations games require things to be done in the correct order. There are many ways to implement an emulator, and I've only shown one possibility. There are ways to emulate at the CPU instruction level, but then make up for some of the loss of timing later, but I'm just trying to show the worst case. In this example we are doing 15 PPU "things" all at once, but this number varies based on the instruction being executed so things aren't very predicatable.
The way the more accurate emulators deal with this is by emulating each CPU "thing" separately, and running the PPU 3 times for each CPU "thing". The code might look like this:

/* CPU clock cycle 1 */
opcode = internal_ram[PC]; (opcode = $E6)
ppu_run_cycles(3);
cpu_clocks += 1;
(jump to $E6 handler)

/* CPU clock cycle 2 */
addr = internal_ram[PC+1];
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 3 */
data = internal_ram[addr];
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 4 */
data += 1;
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 5 */
internal_ram[addr] = data; /* write back */
ppu_run_cycles(3);
cpu_clocks += 1;

I hope this explains more fully what I meant by "instruction level" vs. "clock cycle level" emulation. Also, if I've made any mistakes please correct it, I know it's a long post.



-
Entire Thread
Subject  Posted byPosted On
*Timing Question  Laughy10/26/04 7:04 PM
.*Re: Timing Question  teaguecl10/26/04 8:06 PM
..*Re: Timing Question  Laughy10/27/04 6:23 PM
....Re: Timing Question  teaguecl10/27/04 7:34 PM
....*Re: Timing Question  Laughy10/27/04 10:01 PM
.....*Re: Timing Question  teaguecl10/28/04 00:03 AM
......*Re: Timing Question  tepples10/28/04 6:11 PM
.......*Re: Timing Question  Laughy10/29/04 02:19 AM
........*Re: Timing Question  tepples10/29/04 1:52 PM
.........*Re: Timing Question  Laughy10/30/04 00:47 AM
..........*Re: Timing Question  Nessie10/30/04 09:35 AM
....*Re: Timing Question  Fx310/27/04 9:13 PM
..*Re: Timing Question  quietust10/26/04 10:52 PM
...*Re: Timing Question  Laughy10/27/04 6:12 PM
....*Re: Timing Question  quietust10/27/04 7:25 PM
...*Re: Timing Question  Fx310/26/04 11:35 PM
Jump to

Memblers' homepage             Contact Me

Forums powered by WWWThreads Demo