As a compiler/assembler type (who wouldn't mind handling that side of things for CNES, actually), some comments:
We should be drawing on the finest of the assembler traditions when we produce the CNES assembler. Based on the feedback that I got for P65, those would be TASM and DASM. TASM is payware, and DASM was a 1988 Amiga program, so we'd be filling a very-much needed role with this. That means the assembler needs:
- Parameterized macros - not merely lists of instructions, the arguments to those instructions should be seperately settable. First-class macros would rule (macros that take macros as arguments).
- A module system, including anonymous and local labels. This is absolutely necessary for projects of any size to keep the namespace from blowing up.
- Assemble-time expressions. LSB, MSB, addition, subtruction, and multiplication, at minimum, with parenthesization. (To resolve ambiguities between parentheses and indirect instructions, DASM used  for arithmetic parenthesizing.)
- Automatic instruction selection - NESASM's requirement that YOU tell it where the zero page instructions are is unacceptable.
- Multi-file support, both source and binary. Given this and the macro system, a "UNIF output" and "iNES output" set of macros could be very easily defined. Hell, an "Apple IIe" output would be just as easily definable, as could special names for the memory registers.
- Memory segmentation simulation, with overlay support. No extant assembler currently has both (though many have support for one or the other, or let you fake them).
Not a lot to say about libraries other than yeah, we need them, and a combination of macro definitions and code libraries as includable source would handle it nicely (stdsprite.h anyone?)
The disassembler should be able to take arbitrary binary data, not just UNIF. A lot of logic/math code written for the Commodore 64 and Apple II will also run on the NES if it doesn't hinge on Decimal mode - gotta be able to get at that, too.
Basic table-driven disassembly is trivial. We can get a *simple* disassembler out the door in no time flat, as soon as somebody cares enough to make it.
Automatic macro detection is extremely expensive (naive algorithms are n^4) but not conceptually difficult, and hey, 64k^4 instructions doesn't take THAT long on a 500MHz Celeron, now does it? A larger problem is that a lot of "macros" aren't - they're just code that's superficially similar but that does entirely different things. A way around that would be to scan for predefined macros (which is linear time).
More useful than automatic breaking into macros would be breaking things into basic blocks and procedures: marking the targets of JSR statements, and splitting code up into chunks that always run in sequence (that is, nothing in the code ever jumps into that location). That gets trickier to do right in the presence of switchable code blocks. A "first iteration" would stick to code like Gradius and SMB that only have one PRG-ROM block - otherwise, you'd need code-scanners to figure out which PRG-ROM chunks were swapped in when you made the jump, which could wind up being arbitrarily complex, maybe even undecidable.
Trace debugger should really be part of the emulator, not the assembler, though (if we break up the modules right) they should all be easily able to talk to one another. Possibly some way of mapping memory locations/mapper status to source code file/line to jump to the right source line as well as the right instruction.