NESDev and Strangulation Records messageboards
Forum Index | FAQ | New User | Login | Search

Previous ThreadView All ThreadsNext Thread*Show in Threaded Mode


SubjectTiming Question new  
Posted byLaughy
Posted on10/26/04 7:04 PM
From IP129.65.51.12  



I noticed a lot of discussions are very PPU clock cycle specific (loading HT registers on CC 252 etc.), however if I'm not mistaken is such timing even feasible?

The problem is since you don't know which instructions are being executed before hand, the cpu may in fact not have ran for 84 cycles (252 / 3, again the HT counter thing), it may have ran for 86, or even 87. Each scanline could be at a different cpu cycle at this point, and every frame may be different also for a given scanline.

Just wondering if this is a problem at all. :) I can't see how people can get such super specific pixel rendering timing, when in fact the cpu's cycles may be different then what they were shooting for.

Thanks,
Jordan




SubjectRe: Timing Question  
Posted byteaguecl
Posted on10/26/04 8:06 PM
From IP144.189.40.222  



That's a good question, but unfortunately the PPU cycle accuracy is a real problem. The most accurate emulators (Nintendulator, Nestopia etc.) use PPU clock cycles as their base time quantum rather than CPU clock cycles. Emulators which use the CPU clock cannot emulate very specific timing situations, especially where asynchronous events happen in the "middle" of a CPU instruction execution. So the answer to your question is that it doesn't make any difference, unless you are emulating at the PPU clock resolution. It's not as big a deal if you are emulating at the CPU clock resolution, and it makes no difference at all if you are emulating at the CPU instruction level.




SubjectRe: Timing Question new  
Posted byquietust
Posted on10/26/04 10:52 PM



Actually, what you referred to is emulators which use CPU instructions as the base time unit; Nintendulator times everything relative to CPU clock cycles (in this case, memory accesses), even in the middle of instructions.

--
Quietust
P.S. If you don't get this note, let me know and I'll write you another.


SubjectRe: Timing Question new  
Posted byFx3
Posted on10/26/04 11:35 PM
From IP201.13.41.145  



Nostalgic I would say. It's like two circular devices connected and spinning using a same axis, but with differente diameters. The PPU clocking as the smaller circle and the CPU as the bigger: its speed is measured in the peripheria (not angular rate I mean)...

Until you give us an example of mid-instruction ppu event making difference, I don't stand a chance of being nostalgic ;)




SubjectRe: Timing Question new  
Posted byLaughy
Posted on10/27/04 6:12 PM
From IP129.65.51.70  



Relative to cycles! That's insane - how does Nintendulator keep its speed up? :) Run a cycle, check stuff, run a cycle, check stuff... yikes! :)

This implies that it IS important to do things this granular - however is it really? Would Nintendulator suffer if it did things by instructions, or by PPU CC?




SubjectRe: Timing Question new  
Posted byLaughy
Posted on10/27/04 6:23 PM
From IP129.65.51.70  



Hmm I'm having trouble understanding what you're saying. First you say PPU cycle accuracy is a problem, however everyone uses it because it's better? Then you say it doesn't make a difference, unless you use PPU cycle accuracy :)




SubjectRe: Timing Question new  
Posted byquietust
Posted on10/27/04 7:25 PM



> Run a cycle, check stuff, run a cycle, check stuff...
Yep, that's pretty much how it works. The PPU and APU all run single cycles at a time (PPU does 3[.2], APU does 1), all synchronized with each CPU cycle (which are triggered by memory accesses). The only catch is that the CPU can't stop in the middle of an instruction, but that's not much of a problem - on the contrary, it makes things a lot easier.

> How does Nintendulator keep its speed up?
Short answer: it doesn't. Nintendulator may be one of the most accurate NES emulators around, but it's also one of the slowest. It currently requires a CPU over 1000MHz to run at full speed, faster than my own computer!

> Would Nintendulator suffer if it did things by instructions, or by PPU CC?
Yes, it most certainly would. Doing stuff by instruction would make reads and writes take effect earlier or later than they should.
I'm not sure what "by PPU CC" is supposed to mean, though...

--
Quietust
P.S. If you don't get this note, let me know and I'll write you another.


SubjectRe: Timing Question new  
Posted byteaguecl
Posted on10/27/04 7:34 PM
From IP201.134.153.34  



Yea, you're right my first message wasn't very clear. Here is how it works. The PPU does a little bit of work on each PPU clock cycle. The CPU does a little bit of work on each CPU clock cycle. The PPU clock is 3x faster than the CPU clock (NTSC) and so the PPU can do 3 "things" for every 1 that the CPU does. On top of this a single CPU instruction takes anywhere from 1 to 7 (or more?) CPU clocks to complete. The number of "things" the CPU has to do for a particular instruction directly affect how many clock cycles it will take.
For example take opcode 0xE6, which is the INC instruction with Zero Page addressing and takes 5 CPU clock cycles to complete.
1 clock cycle for fetching the opcode from memory (reading $E6 from address at PC)
1 clock cycle for fetching the operand (reading zero page address from PC+1)
1 clock cycle for reading from the zero page address
1 clock cycle for the ALU (arithmetic logic unit) to increment the value
1 clock cycle to write the newly incremented value back to the zero page address

So you can see, opcode $E6 takes 5 clocks to complete because there are 5 "things" the CPU needs to do in order to complete it and each one takes one clock cycle to do.
In hardware, the PPU and CPU operate in parallel. This means that while the CPU is spending 5 CPU CC's executing a INC instruction the PPU is doing 5*3=15 "things".
So here is a timeline of when the "things" get done in real hardware, using the INC as an example. I'm assuming that what the PPU is doing at this point in time is rendering pixels, see brad taylor's doc for more info.

PPU CC - CPU CC
1 (pixel #1) - 1 (Read from PC)
2 (pixel #2)
3 (pixel #3)
4 (pixel #4) - 2 (Read from PC+1)
5 (pixel #5)
6 (pixel #6)
7 (pixel #7) - 3 (Read from zero page addr)
8 (pixel #8)
9 (pixel #9)
10 (pixel #10) - 4 (ALU increments value)
11 (pixel #11)
12 (pixel #12)
13 (pixel #13) - 5 (value written back to zero page addr)
14 (pixel #14)
15 (pixel #15)

A CPU instruction level accurate emulator will handle things in bigger chunks, and might look like this for the INC:
opcode = internal_ram[PC]; (opcode = $E6)
(jump to $E6 handler)
PC += 2;
addr = internal_ram[PC+1]
internal_ram[addr] = internal_ram[addr] + 1;
cpu_clocks += 5;
ppu_update();

It will execute the entire INC instruction, and then update the PPU. A timeline of this might look like the following:
PPU CC - CPU CC
1 (no work) - 1 (no work)
2 (no work)
3 (no work)
4 (no work) - 2 (no work)
5 (no work)
6 (no work)
7 (no work) - 3 (no work)
8 (no work)
9 (no work)
10 (no work) - 4 (no work)
11 (no work)
12 (no work)
13 (no work) - 5 (Read PC, Read PC+1, Read zero page addr, ALU inc, write back)
14 (no work)
15 (render pixels 1 through 15)

As you can see the same work is being done, but not quite in the same order. In some situations games require things to be done in the correct order. There are many ways to implement an emulator, and I've only shown one possibility. There are ways to emulate at the CPU instruction level, but then make up for some of the loss of timing later, but I'm just trying to show the worst case. In this example we are doing 15 PPU "things" all at once, but this number varies based on the instruction being executed so things aren't very predicatable.
The way the more accurate emulators deal with this is by emulating each CPU "thing" separately, and running the PPU 3 times for each CPU "thing". The code might look like this:

/* CPU clock cycle 1 */
opcode = internal_ram[PC]; (opcode = $E6)
ppu_run_cycles(3);
cpu_clocks += 1;
(jump to $E6 handler)

/* CPU clock cycle 2 */
addr = internal_ram[PC+1];
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 3 */
data = internal_ram[addr];
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 4 */
data += 1;
ppu_run_cycles(3);
cpu_clocks += 1;

/* CPU clock cycle 5 */
internal_ram[addr] = data; /* write back */
ppu_run_cycles(3);
cpu_clocks += 1;

I hope this explains more fully what I meant by "instruction level" vs. "clock cycle level" emulation. Also, if I've made any mistakes please correct it, I know it's a long post.




SubjectRe: Timing Question new  
Posted byFx3
Posted on10/27/04 9:13 PM
From IP201.1.130.233  



Quite impressive and in a low-level. Anyway, answer me a question: if a PPU IRQ is triggered in the mid-instruction, what's the drill? Is triggered after the instruction (to get completed) or what?




SubjectRe: Timing Question new  
Posted byLaughy
Posted on10/27/04 10:01 PM
From IP64.161.57.119  



Awesome post :) Thanks!

What about a system wheras the PPU does nothing as far as rendering is concerned, but if the CPU executes an instruction that modifies the PPU registers or a tile that may affect the rendering of the PPU, then the PPU is ran up to that point. So for instance:

(ppu is doing nothing)
INC X (cpu cycle increase)
INC Y (cpu cycle increase)
INC Y (cpu cycle increase)
STY (Some PPU Register or something mid-scanline)-

At the last instruction we increase th CPU cycle only by any delay that takes place for that instruction from writing the value and it taking effect. Note we don't ACTUALLY write the value - YET. We instead run the ppu up to the number of cycles that have gone by:
run_ppu(cpu cycles * 3)

we then write the value to memory, and the add any more cycles for other things that the instruction may do after writing the value (which I don't think there is any).

Basically the ppu only runs when it needs to in order for the cpu instruction to be correct.

At the beginning of H-BLANK the ppu will "catch up" no matter what.

There may be a problem with that which I don't see.




SubjectRe: Timing Question new  
Posted byteaguecl
Posted on10/28/04 00:03 AM
From IP144.189.40.222  



Laughy, what you have described is another way to implement a cycle accurate emulator (and probably more efficient than the one I described above). I refer to what you describe as a "rendevouz" style implementation. I say this because at any given instant in time, the CPU and PPU are not in sync with eachother. However, every once in a while (when a write to PPU registers happens) the two will synchronize together and very breifly the state of the emulation will exactly match the state of the original hardware. This seems like a good way to bridge the problems of accuracy and speed. As long as the rendevouz's happen often enough, the user won't know the difference and the simulation is just as accurate as a more lock-step version.




SubjectRe: Timing Question new  
Posted bytepples
Posted on10/28/04 6:11 PM
From IP68.53.188.30  



This "rendezvous" as you call it describes construction of the true architectural state of the CPU and other devices upon I/O. It can happen only if the core knows that other devices won't fire any interrupts for the next n cycles, which an emulator can usually predict for interrupts triggered by PPU, DMC, and mappers. More info here
http://nesdev.parodius.com/NES%20emulator%20development%20guide.txt
(search for Accurate & efficient PPU emulation)

____________________
My English is better than your Geberquen.


SubjectRe: Timing Question new  
Posted byLaughy
Posted on10/29/04 02:19 AM
From IP64.161.57.119  



Hmm I'm trying to understand what you mean - why would an interrupt affect this? Would we still need to worry about interrupts if instead we put writes to the PPU regiters and such on a stack and rendered the entire screen at the end of the frame?




SubjectRe: Timing Question new  
Posted bytepples
Posted on10/29/04 1:52 PM
From IP68.53.188.30  



Now you're ignoring all mid-frame PPU writes, and games with split-screen scrolling won't work.

As for interrupts, sometimes the mapper will throw interrupts, especially in the case of MMC3 games such as Super Mario Bros. 3.

____________________
My English is better than your Geberquen.


SubjectRe: Timing Question new  
Posted byLaughy
Posted on10/30/04 00:47 AM
From IP64.161.57.46  



No I'm not ignoring PPU mid-frame writes at all - I put such writes on a stack, and at the end of the frame render the screen using those writes to switch over the registers whenever the write occurred. I believe this is the best way to retain accuracy and get the best speed.

I can see how an interrupt may affect this now - if an interrupt occurs part-way through an instruction, should we let the instruction finish then fire the interrupt, or stop, call the interrupt, then let it finish? I guess this could affect things sometimes...




SubjectRe: Timing Question new  
Posted byNessie
Posted on10/30/04 09:35 AM
From IP83.226.103.194  



The IRQ thing can be calculated on each IRQ related write, but what if one of those PPU writes (that you account for at the end of the frame) changes the time when the sprite #0 flag is set?

The interrupt takes effect after the instruction has finished. It actually makes a rather big difference in some games.




Previous ThreadView All ThreadsNext Thread*Show in Threaded Mode
Jump to

Memblers' homepage             Contact Me

Forums powered by WWWThreads Demo