1

I have a problem described in the title. I have an Edify language parser that runs without errors when I building it on arm but fails when I try to use it with x86. I traced segfault to yy_scan_bytes function, more precisely to this code:

YY_BUFFER_STATE yy_scan_bytes  (yyconst char * yybytes, int  _yybytes_len ) {
YY_BUFFER_STATE b;
char * buf;
yy_size_t n;
int i;
/* Get memory for full buffer, including space for trailing EOB's. */
n = _yybytes_len + 2;
buf = (char *) yyalloc(n  );

if ( ! buf ) {
  YY_FATAL_ERROR( "out of dynamic memory in yy_scan_bytes()" );
}

for ( i = 0; i < _yybytes_len; ++i ) {
  buf[i] = yybytes[i]; // <==========
}

The full code is here: https://github.com/twaik/edify_x86_failing_code I've got it from AROMA Installer source. That's everything I discovered after debug. Thanks.

Twaik Yont
  • 102
  • 10
  • To have a segfault here, `[i]` is out-of-bounds for either `buf` or `yybytes`. Have you inspected these strings during processing? This file is genereated by the lexical analyser maker `lex` (or `flex`). Have you tried regenerating this file? Theoretically it shouldn't have issues, but obviously it does. – Kingsley Sep 24 '18 at 23:48
  • Regenerating does not affect it, it still segfaults. I am using function int parse_string(const char* str, Expr** root, int* error_count) { yy_switch_to_buffer(yy_scan_string(str)); return yyparse(root, error_count); } to measure length of string. But it fails on yy_scan_string function. – Twaik Yont Sep 25 '18 at 00:09
  • Do you have access to another version of flex/lex etc. ? – Kingsley Sep 25 '18 at 00:47
  • 1
    You should add `-Wall` to the compile command in your Makefile. You don't seem to be declaring the flex buffer functions anywhere. `-ggdb` would help you debug, too. – rici Sep 25 '18 at 00:54
  • Note that you should not create function, variable or macro names that start with an underscore, in general. [C11 §7.1.3 Reserved identifiers](https://port70.net/~nsz/c/c11/n1570.html#7.1.3) says (in part): — _All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use._ — _All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces._ See also [What does double underscore (`__const`) mean in C?](https://stackoverflow.com/a/1449301/15168) – Jonathan Leffler Sep 25 '18 at 14:28
  • @JonathanLeffler: The quoted snippet is part of the code generated by flex, so you should take that up with the flex authors and not with OP, who is entirely innocent. :) In any case, you're allowed to use a local (not file-scope) name which starts with a single underscore followed by a lower case letter, so I think the flex authors are also working within the standard. – rici Sep 25 '18 at 15:09

1 Answers1

1

Trying to build your code gives me these errors:

main.c: In function ‘parse_string’:
main.c:27:5: warning: implicit declaration of function ‘yy_switch_to_buffer’ [-W
implicit-function-declaration]
     yy_switch_to_buffer(yy_scan_string(str));
     ^~~~~~~~~~~~~~~~~~~
main.c:27:25: warning: implicit declaration of function ‘yy_scan_string’ [-Wimplicit-function-declaration]
     yy_switch_to_buffer(yy_scan_string(str));

That means that the compiler assumes that yy_switch_to_buffer() and yy_scan_string() return an int, as it does for all functions that are not declared before use (as per the c89 standard). But that is not the case (the first returns void, and the second a pointer (YY_BUFFER_STATE)). Notice that on x86_64, the size of a pointer is not the same as the size of an int.

Adding some band-aid prototypes like

void yy_switch_to_buffer(void*);
void *yy_scan_string(const char*);

to main.c, before their use in parse_string() may stop the segfaulting.

A better fix would be to arrange in the Makefile that the lexer be run with the --header-file=lex-header.h option, and then include lex-header.h from main.c. Or even better, wrap all lex-specific code in some simple functions, and put the prototypes of those functions in a header included from both main.c and the *.l file.

  • Ok. For now edify tests passes. But parsing real script is starting with `line 1 col 1: syntax error, unexpected BAD, expecting IF or STRING or '!' or '('` error. But not on arm. Any ideas? – Twaik Yont Sep 25 '18 at 11:13
  • that's hard to tell without the "real script". –  Sep 25 '18 at 12:18
  • You can use that one for testing https://github.com/amarullz/AROMA-Installer/blob/master/assets/META-INF/com/google/android/aroma-config – Twaik Yont Sep 25 '18 at 13:47
  • @twaik: that script starts with a BOM, as you can see from `hd aroma-config|head -n1` (which prints `00000000 ef bb bf 23 23 23 20 4c 49 43 45 4e 53 45 3a 0a |...### LICENSE:.|`). I suppose you (or somebody) created the file on Windows. – rici Sep 25 '18 at 15:22
  • and the program only reads 8191 bytes (or less) from the file, out of 50002. –  Sep 25 '18 at 15:35
  • @mosvy: That's true, but it fails on the first one :) There are other issues, too. – rici Sep 25 '18 at 15:45
  • and it doesn't recognize a lot of functions/keywords, and in main() it will try to dump & evaluate the `root` variable, even if yyparse() failed and didn't set it at all. –  Sep 25 '18 at 15:48
  • Thank you guys. Looks like BOM is the exact problem. That is just the test snippet that tries to use syntax analyzer from the target project. – Twaik Yont Sep 25 '18 at 19:29
  • AROMA Installer has BOM skipper and I don't know why it is not working on x86. ` if ((script_data[0] == 0xEF) && (script_data[1] == 0xBB) && (script_data[2] == 0xBF)) { script_data += 3;} ` – Twaik Yont Sep 25 '18 at 19:29
  • For some reason script_data[0,1,2] was seen as "0xFFFFFFEF 0xFFFFFFBB 0xFFFFFFBF", not "0xEF 0xBB 0xBF" as expected. I've added "& 0xFF" to the byte checking code and now everything is working. – Twaik Yont Sep 25 '18 at 22:48
  • That's because you `script_data` was signed chars, and they were sign-extended when promoted to `int` for the sake of comparing them to `0xEF`, etc. You can reproduce it with `char *x = "\xef"; printf("%d %d\n", x[0] == 0xef, x[0] == '\xef');`. –  Sep 25 '18 at 23:07
  • 2
    btw, the `-Wtype-limits` option may help. Use it. –  Sep 25 '18 at 23:23