5

The jaxon elixir package crashes systematically on only one of my two macs even though they are running the same version of MacOS, XCode, clang, erlang and elixir.

Here is the more detailed bug report

How can I investigate this, or what other dependencies can I check to try to resolve this?

Marc-André Lafortune
  • 78,216
  • 16
  • 166
  • 166
  • Did you tried to downgrade Elixir to `1.11.x`, I had a problem with [Facebook](https://github.com/mweibel/facebook.ex) package, perhaps they need to update the package to support `Elixit 1.12`, but this is just a guess. – copser Jun 19 '21 at 18:13
  • I have not, but the package's test suite passes on my other machine, running `elixir 1.12`... – Marc-André Lafortune Jun 20 '21 at 01:40
  • Does this answer your question? [What is the "Illegal Instruction: 4" error and why does "-mmacosx-version-min=10.x" fix it?](https://stackoverflow.com/questions/14268887/what-is-the-illegal-instruction-4-error-and-why-does-mmacosx-version-min-10) – Adam Millerchip Jun 20 '21 at 05:27
  • @AdamMillerchip No, in my case the compiling happens on the same machine as where I then try to run it – Marc-André Lafortune Jun 20 '21 at 16:39
  • Long shot: Are your asdf `erlang` plugins the same version, and do you have any compiler flags set differently between machines? It looks more like an Erlang bug than a Jaxon one, but I'm not sure. Posting on https://elixirforum.com/ might get some more helpful responses. – Adam Millerchip Jun 21 '21 at 10:35
  • Random guess: code such as `if(buf + 5 < limit)` on [this line](https://github.com/boudra/jaxon/blob/15f42937d5e31d1521dc40e9c5da3635bd0e9ab2/c_src/decoder.c#L359) is undefined behaviour in C, as you're not allowed to increment a pointer more than 1 step beyond the end of the corresponding array. Try changing that to `if(limit - buf >= 5)`, and likewise a few other places in the code. – legoscia Jun 21 '21 at 17:39
  • Oops, that should be `limit - buf > 5`... – legoscia Jun 21 '21 at 23:19
  • 1
    @AdamMillerchip No particular option that I know of. I get the same compilation command on both machines: `clang -undefined dynamic_lookup -dynamiclib -msse2 -mavx2 -std=c99 -O3 -I/Users/work/.asdf/installs/erlang/24.0.1/erts-12.0.1/include c_src/decoder*.c -o priv/decoder.so`. Posted on https://elixirforum.com/t/how-to-troubleshoot-a-illegal-instruction-4-crash-with-jaxon/40578 – Marc-André Lafortune Jun 22 '21 at 05:03
  • @AdamMillerchip I removed the `-O3` option for kicks and I'm getting closer. Question updated. – Marc-André Lafortune Jun 22 '21 at 05:11
  • False alarm. I'm still getting crashes :-( – Marc-André Lafortune Jun 22 '21 at 06:03
  • Are there any compiler warnings when compiling the C code? – Venkatakumar Srinivasan Jun 22 '21 at 22:29
  • @VenkatakumarSrinivasan no warnings. I re-compiled with `-v` option and I'm getting `ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include"` and `Library/Frameworks`, but the same on both machines – Marc-André Lafortune Jun 23 '21 at 16:02
  • Just to clarify, is there any reason why you must use this particular JSON package and not something like `jason`? – Everett Jun 23 '21 at 18:37
  • @Everett I believe `jaxon` offers a few features that `jason` does not like streaming. – Marc-André Lafortune Jun 23 '21 at 19:49
  • One approach to debug the problem is to see if the issue is reproducible in elixir repl. If reproducible, then you can seed the C code with printf() statements to narrow down the lines of code that is causing the crash – Venkatakumar Srinivasan Jun 26 '21 at 08:31
  • @VenkatakumarSrinivasan I added some `printf` but running the same test gives me crashes at different points, including some that don't make any sense to me https://github.com/boudra/jaxon/issues/27#issuecomment-866997390 – Marc-André Lafortune Jun 26 '21 at 14:15

1 Answers1

1

Your CPU could not handle the instructions it tries to execute.

Check the CPU model of your "MacPro 5,1", and whether it can handle AVX2 or other optimizations of the compiler.

paiv
  • 5,491
  • 22
  • 30
  • You're right, my [Xeon W3670](https://ark.intel.com/content/www/us/en/ark/products/47918/intel-xeon-processor-w3670-12m-cache-3-20-ghz-4-80-gt-s-intel-qpi.html) does not support AVX2, and that must be what is causing the issue. Thanks so much, now I know what is going on... – Marc-André Lafortune Jun 27 '21 at 15:52