4

I'm trying to find a root cause of the "Illegal instruction" exception (0xc000001d) with WinDbg. The project was built with VC++2015. I've got two memory dumps from two test runs.

For now I found the following that is true for both dumps:

  • the exception points to the "movq mmword ptr [ecx], xmm0" instruction
  • xmm0 contains zeros
  • the exception occurs in an object constructor
  • the address is inside DS
  • the address belongs to a heap entry which looks valid
  • the address points to the object is being constructed, so it seems like it tries to put zero to the obj.m_data member that looks valid too

I have no idea where to go further, so I'd appreciate any directions.

UPD:

...
movq    xmm0,mmword ptr [esi]
lea     ecx,[edi+94h]
movq    mmword ptr [ecx],xmm0 ; << this causes the exception
Rom098
  • 2,445
  • 4
  • 35
  • 52
  • 2
    It seems unlikely, but is there any chance that [the CPU doesn't support SSE2?](https://groups.google.com/forum/#!topic/comp.lang.asm.x86/s35RlDxuoks) – Harry Johnston May 28 '17 at 21:36
  • @HarryJohnston thanks, I will check this. Is there a way to know which CPU exception was generated (using windbg and the memory dump)? – Rom098 May 28 '17 at 21:51
  • You just said that the exception was 0x1d. – conio May 28 '17 at 21:58
  • 2
    @conio, I don't think the CPU exception codes use the same numbers as the Windows exception codes. [Invalid opcode is exception 0x6](http://wiki.osdev.org/Exceptions), for example, but Windows converts that to 0xC000001D. I don't know whether or not there are any other CPU exception codes that also map to 0xC000001D, – Harry Johnston May 28 '17 at 22:03
  • @HarryJohnston: All of that is reasonable inference. Is there a reason to suspect that the actual exception was a divide by zero or a PF? – conio May 28 '17 at 22:11
  • @HarryJohnston The processors are ""AMD Sepron 3000+ Socket A" and "AMD Athlon XP 2400+ Socket A". As far as I understand, Athlon XP doesn't support SSE2. Regarding Sempron, it generally supports SSE2, but not for "Socket A" modification. Is this correct information? If yes, this is the root cause, right? – Rom098 May 29 '17 at 09:56
  • @HarryJohnston The test were run on Windows XP, 7 Home and 7 Pro 32bit. Actually I'm confused with the following: "128-bit operations will generate #UD only if OSFXSR in CR4 is 0." How can I check if the operation systems support FXSAVE and FXRSTOR? – Rom098 May 29 '17 at 10:11
  • @HarryJohnston I updated the question with some asm lines. Also I'm wondering why the "movq xmm0,mmword ptr [esi]" instruction 2 lines above doesn't cause the exception. – Rom098 May 29 '17 at 10:48

1 Answers1

3

Illegal instruction is raised when the operating system handles a fault from the CPU where it has failed to decode an instruction. This can occur if an instruction extension is not supported by the CPU or the operating system. msdn : illegal instruction AVX. In this case the bug in msvc 2013 occurred when the CPU supported AVX, but the operating system did not.

The CPUs which are failing don't appear to support SSE2, which is a likely cause for this issue.

In the case I came across the AVX issue, when using a tool to identify if AVX was used, there was a CPU test which decided that the AVX was not supported by the tool (supplied by Intel).

I am not aware of a tool by AMD, and would be wary of such a tool working, as it may be that it is the operating system support which is missing.

Update

Why does an instruction fail if the operating system does not support it? An example of this is the AVX instructions, which from wikipedia : AVX states.

AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system support is required to properly save and restore AVX's expanded registers between context switches.

Any change to the work or memory needed by the operating system, probably requires explicit opt-in. In the case of AVX, the extra registers changed the amount of data stored for a context switch.

mksteve
  • 12,614
  • 3
  • 28
  • 50
  • I updated the question with some asm lines. I'm wondering why the "movq xmm0,mmword ptr [esi]" instruction 2 lines above doesn't cause the exception. – Rom098 May 29 '17 at 10:48
  • Maybe, maybe not. There's no evidence yet. Which commands could be useful to check that? How to find out whether this scenario applies? – Thomas Weller May 29 '17 at 16:40
  • How can an instruction extension be "not supported by the operating system"? What does that mean? – conio Jun 03 '17 at 23:02
  • 1
    @conio, according to [the SSE article](http://wiki.osdev.org/SSE) on the osdev.org wiki: "Since this change added new registers, it is disabled by default as the typical operating system of that time was not yet able to save those registers on a task switch." That is, the CPU will not run certain instructions unless the operating system tells it, "yep, I know how to deal with that functionality, go ahead and turn it on." – Harry Johnston Jun 04 '17 at 23:00