0

Yesterday I asked you guys a question and realize that reading and writing a chunk of blocks is more efficient instead of just one block at once. So now i try to read and write a chunk of blocks but the segmentation error is occurred for me.

#define POSITIONAL_TOKEN_CHUNK_SIZE 1000000
#define POSITIONAL_TOKEN_WORD_LENGTH 15
#define POSITIONAL_TOKEN_ID_LENGTH 4
#define POSITIONAL_TOKEN_LENGTH (POSITIONAL_TOKEN_WORD_LENGTH + POSITIONAL_TOKEN_ID_LENGTH)

struct PTBlk {
    char w[POSITIONAL_TOKEN_WORD_LENGTH + 1];
    int id;
};

int transform_to_bin_a(FILE * fin, int fd) {
    int n = 0;
    int m = 0;
    char buf[TRANSFORM_TO_BIN_BUF_LENGTH] = {};
    struct PTBlk * blks = (struct PTBlk *)malloc(POSITIONAL_TOKEN_CHUNK_SIZE * POSITIONAL_TOKEN_LENGTH);
    if (blks == NULL) {
        puts("no memory being allocated");
        return 0;
    }
    memset(buf, 0, TRANSFORM_TO_BIN_BUF_LENGTH);
    printf("total of size being allocated is %d bytes\n", POSITIONAL_TOKEN_CHUNK_SIZE * POSITIONAL_TOKEN_LENGTH);
    while (fgets(buf, TRANSFORM_TO_BIN_BUF_LENGTH, fin) != NULL) {
        ++n;
        sscanf(buf, "%s %d", blks[m].w, &blks[m].id);
        ++m;
        if (m >= POSITIONAL_TOKEN_CHUNK_SIZE) { // error !!
            write(fd, (void *)blks, POSITIONAL_TOKEN_CHUNK_SIZE * POSITIONAL_TOKEN_LENGTH);
            m = 0;
        }
        memset(buf, 0, TRANSFORM_TO_BIN_BUF_LENGTH);
    }
    if (m > 0) {
        printf("n:%d m:%d\n", n, m);
        printf("m:%d\n", m);
        write(fd, (void *)blks, m * POSITIONAL_TOKEN_LENGTH);
    }
    printf("n:%d\n", n);
    lseek(fd, 0, SEEK_SET);
    write(fd, (void *)&n, POSITIONAL_TOKEN_SEARCH_BEGIN);
    free(blks);
    return 1;
}

I guess too large POSITIONAL_TOKEN_CHUNK_SIZE is one of problems. But i don't understand why it is reason for segmentation fault. Because in the code i try to allocate only 20,000,000 bytes on heap. Sometimes i coded a long int array as global variable such as 'int arr[20000000];'. But it doesn't matter.

What do i misunderstand in the code?

chatterboy
  • 95
  • 12
  • 1
    Also you shouldn't cast the return of `malloc`: http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc – UnholySheep Mar 09 '17 at 12:30
  • 1
    If you compile this with a C++ compiler you should tag it as C++ (but not C). Otherwise, the opposite applies. – moooeeeep Mar 09 '17 at 12:32
  • it is good to check return value of sscanf – AndersK Mar 09 '17 at 12:33
  • @moooeeeep yes i compile this code with -std=c++11 – chatterboy Mar 09 '17 at 12:35
  • seems there is no logic between how many bytes you allocate and the struct PTBlk, shouldnt it be something like `sizeof(struct PTBlk)*POSITIONAL_TOKEN_CHUNK_SIZE` – AndersK Mar 09 '17 at 12:36
  • 4
    Then I don't understand why you don't use the appropriate C++ features for parsing, memory management etc. – moooeeeep Mar 09 '17 at 12:37
  • 2
    @user2685907 There's a lot of C in your C++. Fix it and I bet the segfault will go away as a bonus. – Biffen Mar 09 '17 at 12:37
  • What is struct PTBlk? – Willis Blackburn Mar 09 '17 at 12:38
  • @AndersK. The size of PTBlk is 20 bytes I forgot – chatterboy Mar 09 '17 at 12:42
  • @WillisBlackburn PTBlk is a block with 20 bytes I forgot – chatterboy Mar 09 '17 at 12:43
  • 2
    How sure are you that the struct is really 20 bytes in size? And can you guarantee that every compiler will make it exactly 20 bytes? – UnholySheep Mar 09 '17 at 12:45
  • Are you sure? The person who suggested that you use `sizeof PTBlk` is absolutely correct. – Willis Blackburn Mar 09 '17 at 12:45
  • @moooeeeep Well i have a bias coding with C is more faster than C++. And later i want to compile it with gcc also – chatterboy Mar 09 '17 at 12:46
  • @UnholySheep Ah.. yes you are correct. The size was not 20 bytes and now it is fixed. – chatterboy Mar 09 '17 at 12:56
  • 1
    That's a hack, not a fix - why are you so averse to using the `sizeof` operator? – UnholySheep Mar 09 '17 at 12:57
  • @UnholySheep I think if this function is running so many times. Using sizeof is more expensive than definition. – chatterboy Mar 09 '17 at 13:02
  • 2
    [`sizeof`](http://en.cppreference.com/w/c/language/sizeof) is a compile-time evaluated operator (except in C for VLAs) - so that assumption is wrong. And your macro is inherently dangerous (and wrong) since neither the C nor the C++ standard guarantees that your struct has exactly the size you calculated – UnholySheep Mar 09 '17 at 13:06
  • 2
    ...and because it could be considered basic knowledge that sizeof isn't a runtime construct, keep learning. If the only reason why you don't use C++ features is that it is slower than C, you might be in for some surprises. (I'm not saying that there are no slow things, but that just "C++" doesn't automatically decreases speed) – deviantfan Mar 09 '17 at 13:13
  • @UnholySheep Thank you for your comment. I understand what you say. – chatterboy Mar 09 '17 at 13:22

0 Answers0