First of all... I'd highly recommend against basing everything on CPU cycles.
Second of all... DON'T render full scanlines at a time. Several games do mid-scanline effects and tricks and your emu will distort them if you render full scanlines without accounting for mid-scanline writes.
Third: avoid rounding off numbers wherever possible (ie: 341 / 3 is closer to 113.66667 than to an even 113)
and lastly: Do not count on 3 PPU cycles equalling 1 CPU cycle... because on PAL systems, that is not the case.
I personally recommend keeping 'timestamp' variables for your CPU, PPU, and APU... have them all operating on the same base. Whenever you emulate the CPU/PPU/APU for so many cycles, you increment the appropriate timestamp var accordingly.
I don't think I'm explaining this well... so lemme put it this way:
#define PPUCYC_BASE 5
#define CPUCYC_BASE_NTSC 15
#define CPUCYC_BASE_PAL 16
Assuming you define these constants, when you emulate your PPU for one cycle, you would increment the PPU timestamp variable by PPUCYC_BASE (5). When you run your CPU for one cycle, you would increment the CPU timestamp by the appropriate base (15 or 16, depending on whether or not you're emulating an NTSC or PAL system). This will ensure you have an even 3:1 PPU:CPU cycle ratio on NTSC systems, as well as the 3.2:1 PPU:CPU cycle ration on PAL systems without rounding off any cycles.
For a further explaination of how I do things (which I'd recommend, and which seem to work quite well for me), here's a post from a recent thread you might find interesting:
In fact... you may want to read the entire thread for a better picture.