1. There are 262 scanlines per frame, and 113+2/3 cycles per scanline. This multiplies out to 29780+2/3 CPU cycles per frame, but this approach will not work for any game that does mid-screen scrolling (i.e. for a status bar). The generally accepted approach to emulating the NES is to run the CPU for 113+2/3 cycles (or 114, 114, then 113) and then run the PPU for 1 scanline - this approach will work for a vast majority of games without problems.
2. If you're doing vertical scrolling, the end of the nametable and the bottom of the screen will NOT be the same! The proper time to generate NMI is exactly 1 scanline after the last visible scanline (240 total) is rendered.
3. The 256 bytes of SPR-RAM are separate from the PPU's address space; they can be accessed via the PPU's I/O registers $2003 (to set the address) and $2004 (to read/write data and auto-increment the address). Most games will perform a Sprite DMA by writing a page number ($xx to point to $xx00-$xxFF) to $4014; in reality, this simply performs rapid reads (from the page specified) and writes to $2004 (which is at least 4 times faster than doing it manually).
P.S. If you don't get this note, let me know and I'll write you another.