1) If I am not mistaken, Nesticle contained a bit assembly code. I guess nothing beats the speed of good old assembly. :) I know, me and my friends NES-emulator for AmigaOS, A/NES CGFX (written in pure 68000 asm) runs close to fullspeed on a 68040 at 40mhz!
2) I am 99% sure it uses standard interpretive compilation.
3) I think the CPU-emulation is the easy part. The PPU-emulation part is the bottleneck. However with intelligent cache-routines for sprite/backgrounds...
4) Intelligent cache-routines for sprite/background code. Try not to draw anything that has been drawn previous frame and stuff like that.