I'm working on a legacy design that makes use of the MC68HC11E1 (originally the MC68HC11A1) series micro. It's used in extended mode with an external EPROM for program memory and an external NVRAM for additional program memory, stored data, stack space, and some other various temporary memory space.
I'm having an incredibly difficult to reproduce issue where this NVRAM occasionally gets corrupted and causes all sorts of problems. It's happened in units that have been in the field for years, and it's happened in brand new ones. There are also plenty that have never had the issue. I haven't figured out the root cause yet, but I think I have at least identified the mechanism of failure.
From what I can tell, something mucks up the A/D bus and causes an illegal opcode to be read. This triggers the illegal opcode interrupt, stacking 9 bytes in the process. This then repeats a variable number of times before whatever is mucking up the bus resolves itself and the code can execute normally and the stack can be reset. If it happens enough, it overflows the stack space and starts corrupting stored data and the additional program memory in the NVRAM.
Since the vector for the illegal opcode interrupt is fetched from the external EPROM, if the bus is messed up, any value could potentially be read instead the actual vector. Even if the correct vector is read, the opcodes stored there could be subject to the same problem once the PC is updated.
Is there a way to have the illegal opcode interrupt vector read from an internal location in extended mode, or otherwise ensure that a broken A/D bus that causes good opcodes to become mangled won't also cause the illegal opcode interrupt to jump to another illegal opcode and enter this infinite loop that blows away the NVRAM?