0

I am trying to program a simple bit JVM with C. After reading the .class file with hex, I am trying to parse this file.

char *get_bytecode(const char *filename) {
    FILE *fileptr = fopen(filename, "rb");
    if (!fileptr) {
        fprintf(stderr, "Error: could not open file %s\n", filename);
        return NULL;
    }
    char *buffer = malloc(1);
    buffer[0] = '\0';
    unsigned char byte;
    while(fread(&byte, sizeof(byte), 1, fileptr) == 1) {
        char *temp = malloc(3);
        sprintf(temp, "%02x", byte);
        buffer = realloc(buffer, strlen(buffer) + strlen(temp) + 1);
        strcat(buffer, temp);
        free(temp);
    }
    fclose(fileptr);
    return buffer;
}

There is no problem with the above function that I occured. After that, I wrote a function to parse the bytecode I received:

classfile parse_class(const char *bytecode_hex) {
    classfile classfile;
    memset(&classfile, 0, sizeof(classfile));
    char *endptr;
    classfile.magic = strtoul(bytecode_hex, &endptr, 16);
    printf("Magic: %08X\n", classfile.magic);
    classfile.minor = (uint16_t)strtoul(endptr, &endptr, 16);
    printf("Minor: %04X\n", classfile.minor);
    classfile.major = (uint16_t)strtoul(endptr, NULL, 16);
    printf("Major: %04X\n", classfile.major);
    return classfile;
}

I guess the problem is here because I am getting an output like this:

Magic: FFFFFFFF
Minor: 0000
Major: 0000

but the expected output should be like this:

Magic: CAFEBABE 
Minor: 0000
Major: 0056

I couldn't understand exactly what caused the problem. Thank you in advance for any constructive comments.

Levent Kaya
  • 33
  • 1
  • 1
  • 7
  • You should `printf("%s\n", bytecode_hex)` in the `parse_class()` function. `strtoul()` returns `ULONG_MAX` if it is out of range. – Weather Vane May 11 '23 at 21:52
  • Your `realloc(buffer, strlen(buffer), ...)` looks dubious. Can the bytecode contain zero bytes? Better to maintain the current length of your buffer in a variable, and double it each time you realloc. – pmacfarlane May 11 '23 at 21:54
  • 2
    Procees the bytes first, not hex codes. Use hex only to print the output. – aled May 11 '23 at 21:55
  • The classic way to build an incoming value isn't with all that malarkey, but to multiply an accumulator by the base and add the incoming byte. – Weather Vane May 11 '23 at 21:57
  • Some docs: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html and https://en.wikipedia.org/wiki/Java_class_file You'll have to read in the data as _binary_ bytes and store them on field-by-field basis. This could be a C struct as is, _except_ for variable arrays _embedded_ in the struct (which C does _not_ have): `u2 constant_pool_count; cp_info constant_pool[constant_pool_count-1];` In C, we'd need a pointer: `u2 constant_pool_count; cp_info *constant_pool; // [constant_pool_count-1]` Similar for `interfaces*`, `fields*`, `methods*` and `attributes*` – Craig Estey May 12 '23 at 01:25
  • To start, I'd create two functions: `classfile_load` and `classfile_store`. The first populates the above [C] struct from the class file. The second serializes it to an output file. Both files should now _match_ byte-for-byte. Also, you could have a `classfile_print` that prints out the struct in human readable format (e.g.) `printf("magic=%4.4X\n",ptr->magic);` This proves that you read it in correctly, accounting for byte-endianess, etc. – Craig Estey May 12 '23 at 01:35
  • 2
    Don’t you feel something’s weird about converting data to a hex dump and parsing it back to data? – Holger May 12 '23 at 06:24

1 Answers1

0

Did you look at your bytecode_hex string? You are printing a long string of hexadecimal digits. The first strtoul() processes it in its entirely, overflowing, and so returning 0xffffffff. (Or perhaps 0xffffffffffffffff, since you are only printing the low eight digits.) The next two strtoul() calls see no hex digits, and so return 0.

You need to put in spaces for where you want the strtoul() to stop. Otherwise it has no clue how to parse a string of nothing but hex digits.

Also, it makes no sense to convert the byte codes to hex, and then back to binary. Just process the byte codes.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158