This is related to Probable instruction Cache Synchronization issue in self modifying code? I had asked sometime back. Even though the accepted solution solved the related issue I came across a new intermittent failure mode where the CPU tries to jump to a junk address after the function is switched back on. But the disassembly after the fact (using core dump) shows the correct address in the call instruction.
Some gdb analysis follows.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000014010d0 in ?? ()
(gdb) bt
#0 0x00000000014010d0 in ?? ()
#1 0x0000000000492e01 in FastPelY_14 ()
#2 0x000000000045d85d in SubPelBlockMotionSearch ()
#3 0x0000000000467a23 in BlockMotionSearch ()
#4 0x0000000000469c99 in PartitionMotionSearch ()
#5 0x0000000000487bf7 in encode_one_macroblock ()
#6 0x0000000000496ccd in encode_one_slice ()
#7 0x0000000000426081 in code_a_picture ()
#8 0x000000000042766f in frame_picture ()
#9 0x000000000042664b in encode_one_frame ()
#10 0x0000000000430a23 in main ()
(gdb) disas /r 0x0000000000492e01
0x0000000000492dfc <+38>: e8 cf e2 f6 ff callq 0x4010d0
=> 0x0000000000492e01 <+43>: 8b 45 e4 mov -0x1c(%rbp),%eax
The interesting thing to note here is that while the correct address is 0x4010d0 the junk address is always 0x14010d0 when it fails. Which makes me think it is the call instruction which failed somehow even-though the instruction pointer is shown as to point the next instruction in the backtrace. (May that's the proper behavior with gdb. I am not quite sure).
So if that's the case, apparently the CPU has tried to call in to e8 cf e2 f6 00 instead of e8 cf e2 f6 ff. The 5 byte sequence which initially lived at the call site starting from 0x0000000000492dfc is a 5 byte NOP (according to the suggestions given in the question linked at the top) of 0x0F1F440000.
Any ideas what's going on here? Please let me know if more context is needed. By the way I am on a Intel(R) Xeon(R) CPU E5-2670 but the behavior seems consistent across couple of other machines I tried.
Edit : The code has been compiled with following additional options with -O2 optimization level.
-fno-optimize-sibling-calls -finstrument-functions