3

I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory.

As we are reading fairly much data at once each time and binary data the preferred option is to use fread.

So if I make a very simple example it would go like this

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

_Cilk_shared uint8_t* _Cilk_shared buf;

int main(int argc, char **argv) {
  printf("Argv is %s\n", argv[1]);
  FILE* infile = fopen(argv[1], "rb");
  buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(2073600);
  int len = fread(buf, 1, 2073600, infile);
  if(ferror(infile)) {
    perror("ferror");
  }
  printf("Len is %d and first value of buf is %d\n", len, *buf);
  return 0;
}

The example is very simplified from the real code but enough to examplify the behavior.

This code would then return

ferror: Bad address
Len is 0 and first value of buf is 0

However if we switch out the fread for a fgets (not very suitable for reading binary data, specially with the return value) things work great.

That is we switch fgets((char *) buf, 2073600, infile); and then drop the len from the print out we get

first value of buf is 46

Which fits with what we need and I can run _Offload_cilk on a function with buf as an argument and do work on it.

Is there something I am missing or is fread just not supported? I've tried to find as much info on this from both intel and other sites on the internet but I have sadly been unable to.

----EDIT----

After more research into this it seems that running fread on the shared memory with a value higher than 524287 (524287 is 19 bits exactly) fread gets the error from above. At 524287 or lower things work, and you can run as many fread as you want and read all the data.

I am utterly unable to find any reason written anywhere for this.

Asthor
  • 598
  • 4
  • 17

2 Answers2

2

I don't have a PHI, so unable to see if this would make a difference -- but fread has it's own buffering, and while that may be turned of for this type of readind, then I don't see why you would go through the overhead of using fread rather than just using the lower level calls of open&read, like

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdint.h>

_Cilk_shared uint8_t* _Cilk_shared buf;

int main(int argc, char **argv) {
  printf("Argv is %s\n", argv[1]);
  int infile = open(argv[1], O_RDONLY); // should test if open ok, but skip to make code similar to OP's
  int len, pos =0, size = 2073600;
  buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(size);
  do { 
      buf[pos]=0; // force the address to be mapped to process memory before read
      len = read(infile, &buf[pos], size);
      if(len < 0) {
         perror("error");
         break;
      }
      pos += len; // move position forward in cases where we have no read the entire data in first read.
      size -= len;
  } while (size > 0);
  printf("Len is %d (%d) and first value of buf is %d\n", len, pos, *buf);
  return 0;
}

read & write should work with shared memory allocated without the problem you are seeing.

Soren
  • 14,402
  • 4
  • 41
  • 67
  • Good idea, as it could potentially be the buffering. But sadly it returns the same result as the fread code with bad address. – Asthor May 19 '16 at 11:07
  • Somehow the address segment is then just not mapped to your process memory space -- my guess you be that `_Offload_shared_malloc` uses some kind of lazy evaluation, so what about trying force the mapping to process space, by writing a single byte to the first address, e.g. `*buf = 0;` just before the `read`? – Soren May 19 '16 at 13:54
  • Yeah. Which fits with the theory in the above answer. Read doesn't read all 2073600 bytes though but 1048568 bytes. fread also works and reads the same amount. This probably relates even more to the whole lazy allocation it does. – Asthor May 19 '16 at 14:10
  • Just loop and read -- the read call is never guaratee to give you want you ask for, and in socket programming it is common to loop and aggregate the data -- and it is not unusual to look at the specific error code in `errno` when the read call return -1 to see if some of the operations can be retried before giving up -- I editited the code for a minimal looping strcture that should work for you. – Soren May 19 '16 at 14:20
  • Yeah, already looping. More an interest in understanding why the actual allocation works. Thanks for the edit, the code snippet however is just an quick example for the question. The actual code is a lot more complex. – Asthor May 20 '16 at 00:19
1

Can you try to insert something like this before the fread calls?

memset(buf, 0, 2073600); // after including string.h

This trick worked for me, but I don't know why (lazy allocation?).

FYI, you can also post a MIC question on this forum.

  • Yeah that works. I am guessing you are right on the lazy allocation, eager allocation doesn't make much sense anyway. Can't however find any documentation on it which would be nice. – Asthor May 19 '16 at 11:14