0

I'm writing a cache simulator in C and have it all pretty much done...except when I try and scan in addresses fscanf is skipping some of the digits in the hex number: it will only get 4 bytes! If I can't get the right address, the tag bits are incorrect and the simulation won't always work. The task seems pretty straight forward, but I must be missing something. Maybe something to do with fscanf format string idiosyncrasies?

The source file looks like this:

 S 00600aa0,1
I  004005b6,5
 S 7ff000398,8
 M 7ff000390,8
// and so on ...

I have tried using fgets and sscanf instead, but I get the same result.

char buffer[200];
char *pattern = " %c %x,%s\n";
int status; long address; char op; 
while ((status = fscanf(source, pattern, &op, &address, buffer)) != EOF) { 
    if (op != 'I')  { 
        fprintf(stderr,"address: %x\n",address); // DEBUG stmnt 
        simulate the cache..........................

The debug statement prints out the wrong address for lines 3. Instead of "address: 7ff000398" it writes "address: ff000398". It gets it right for line 1. Why does it only read in the first 4 bytes? 'address' is already a long and I can't find any documentation about %x behaving like this.

  • 1
    `7.ff.00.03.98` is too long to fit on a 32 bits integer (4 bytes) You need a 64 bits integer to store it. Also to read it, you will need `%lx`. – Mathieu May 16 '19 at 08:40
  • Note that giving format string as variable is usually considered bad practice (easy to hide bugs, and introduce security vulnerabilities). Also, check that scanf return value matches format string, always. – hyde May 16 '19 at 08:53

3 Answers3

1

The address vairable's type is long which can store only 4 bytes and 0x7ff000398 is 8 bytes. so its store only the 4 last significant bytes and ignore the most significant. That's the reason line 1 and 2 works as expected but line 3 doesnt.

to fix it you can change the type of address to long long

Yoni Newman
  • 185
  • 13
0

You will need a 64 bits integer to store the result.

32 bits systems

On 32 bits system, long type may be only 32 bits long.

To be sure, you can use long long type for address.

Better, would be to use uint64_t type (defined in stdint.h on c99)

Changes

  1. Change address type to long long (if needed)
  2. Change %x to %lx (for long) or %llx(for long long) (two places)
  3. Compile with warnings turned on.

Warnings

With warnings turned on, you should get the message:

warning: format ‘%x’ expects argument of type ‘unsigned int’, 
but argument 3 has type ‘long int’ [-Wformat=]
fprintf(stderr,"address: %x\n",address); // DEBUG stmnt
                         ~^    ~~~~~~~
                         %lx

Which gives you a good clue to solve your problem.

So, your code should look like:

#include <stdio.h>
int main(void)
{
    char buffer[10];
    char *pattern = " %c %lx,%s\n";
    int status; 
    unsigned long address; char op; 
    while ((status = scanf(pattern, &op, &address, buffer)) != EOF) { 
        fprintf(stderr,"address: %lx\n",address); 
    }
    return 0;
}

With your input, I get the result:

address: 600aa0
address: 4005b6
address: 7ff000398
address: 7ff000390
Community
  • 1
  • 1
Mathieu
  • 8,840
  • 7
  • 32
  • 45
0

Since C99 you can use uintptr_t, an unsigned integer type that is capable of storing a pointer (if those addresses are pointers on the same target machine).

#include <stdio.h>
#include <stdint.h> 
#include <inttypes.h>

int main(void)
{
    char *str = "7ff000398";
    uintptr_t address;

    sscanf(str, "%" SCNxPTR, &address); // x for base 16
    printf("%" PRIxPTR "\n", address); // x for base 16
    return 0;
}
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
  • *you can use `uintptr_t`, an unsigned integer type that is capable of storing a pointer* Not on a 32-bit machine with 4-byte pointers. – Andrew Henle May 16 '19 at 09:23
  • 2
    @AndrewHenle what's the problem using `uintptr_t` on 32-bit machine with 4-byte pointers?, as far as I know they're safe in any architecture. – David Ranieri May 16 '19 at 09:27
  • 1
    If pointers are four bytes, `uintptr_t` only needs to be four bytes also. So it isn't guaranteed to be able to hold an 8-byte value. Among the required types, only `[unsigned] long long` is guaranteed to be 8 bytes. – Andrew Henle May 16 '19 at 09:29
  • _If pointers are four bytes, uintptr_t only needs to be four bytes alsoonly needs to be four bytes also_ ... are you sure? ... _only [unsigned] long long is guaranteed to be 8 byte_ ... to be **at least** 8 bytes, but thats not the point, on a 8 bytes addressable arch `sizeof(uintptr_t)` should be == 8 – David Ranieri May 16 '19 at 09:29
  • 1
    `uintptr_t` isn't even guaranteed to **exist**. Per [**7.20.1.4 Integer types capable of holding object pointers**](https://port70.net/~nsz/c/c11/n1570.html#7.20.1.4): "These types are optional." And no, `uintptr_t` does *not* have to be 8 bytes: "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer" On a 32-bit machine with 4-byte pointers, `uintptr_t` only needs to be four bytes. See https://stackoverflow.com/questions/1845482/what-is-uintptr-t-data-type among many others. – Andrew Henle May 16 '19 at 09:36
  • _And no, uintptr_t does not have to be 8 bytes_ I didn't say that, read again what I said: _on a 8 bytes addressable arch sizeof(uintptr_t) should be == 8_, and on a 32 bits arch it may be 4 bytes, so what? I don't see any problem with this, and yes, they are optional, what's the problem with an optional feature if the compiler supports it? – David Ranieri May 16 '19 at 09:39
  • @AndrewHenle wait, I think that I understand what you are trying to say, you mean reading from an 8 bytes source to an 4 bytes target? in this case of course you are right ... OP must specify if those addresses are pointers on the same target machine ... , in this case `uintptr_t` is completely safe to use. – David Ranieri May 16 '19 at 09:54
  • *I don't see any problem with this* The questioner hasn't specified the architecture, which means your answer may not work. I apologize for not clearly stating that earlier. – Andrew Henle May 16 '19 at 10:40
  • My answer _should_ work if those address are on the same target machine, regardless of the architecture. But yes, this is not specified in the question, I'll edit the answer ... – David Ranieri May 16 '19 at 10:42