1

I need to convert a text file of the following format to binary:

The first line contains the number of products in the inventory, The following lines contains:
product name '\t' product price '\t' quantity '\n' (there can be more than one \t between columns)

For every product the binary output file will contain an int representing the length of the product name, the chars that hold the product name, an int representing the price and an int representing the quantity.

Sample input file:

Asus Zenbook    1000    10
iPhone 5        750     22
Playstation 4   1000    0

I have wrote the following code, and I understood that I'm supposed to see the string in plain text while the integers will show up as gibberish (in binary):

int convertTextToBinary(char *fileName)
{
    FILE *pText, *pBinary;

    int size, i;

    char *currProdName;
    int currProdNameLen, currQuantity, currPrice;

    if (checkFileExists(fileName) == FALSE)
    {
        printf("- Given file does not exists!\n");
        return ERROR;
    }

    else
        pText = fopen(fileName, "r");

    // get the number of products in the inventory
    fscanf(pText, "%d", &size);
    #ifdef DBG
    printf("##DBG Successfuly read &size = %d DBG##\n", size);
    #endif  
    pBinary = fopen(strcat(fileName, ".bin"), "wb");

    fwrite(&size, sizeof(int), 1, pBinary);
    #ifdef DBG
    printf("##DBG Successfuly wrote &size = %d DBG##\n", size);
    #endif  
    for (i = 0; i < size; i++)
    {
        // get product name and name length
        currProdNameLen = getProdName(pText, &currProdName);
        #ifdef DBG
        printf("##DBG %d Successfuly read &currProdName = %s DBG##\n", i+1, currProdName);
        printf("##DBG %d Successfuly read &currProdNameLen = %d DBG##\n", i+1, currProdNameLen);
        #endif          
        // get product price 
        fscanf(pText, "%d", &currPrice);
        printf("##DBG %d Successfuly read &currPrice = %d DBG##\n", i+1, currPrice);
        // get product quantity
        fscanf(pText, "%d", &currQuantity);
        printf("##DBG %d Successfuly read &currQuantity = %d DBG##\n", i+1, currQuantity);
        // write data to binary file
        fwrite(&currProdNameLen , sizeof(int), 1, pBinary);
        fwrite(&currProdName, sizeof(char), currProdNameLen, pBinary);
        fwrite(&currPrice, sizeof(int), 1, pBinary);
        fwrite(&currQuantity, sizeof(int), 1, pBinary);
        free(currProdName);
    }

    fclose(pText);
    fclose(pBinary);
    return 1;
}

/* This function checks if a file in a given path exists or not by using fopen with "read" argument */
BOOL checkFileExists(char *fileName)
{
    FILE *fp;

    fp = fopen(fileName, "r");

    // file does not exists
    if (fp == NULL)
        return FALSE;

    // file does exists
    else
    {
        fclose(fp);
        return TRUE;
    }
}
int getProdName(FILE *fp, char **prodName)
{
    int nameLen = 0, offset;

    // count the length of the product name
    while (fgetc(fp) != '\t')
        nameLen++;

    // allcoate memory for the product name
    *prodName = (char*)malloc(sizeof(char)*nameLen);
    //checkalloc(&prodName);

    // get the cursor back to the original position
    offset = -1 * nameLen;
    fseek(fp, offset, SEEK_CUR);

    // copy product name from text to string
    fgets(*prodName, nameLen, fp);

    return strlen(*prodName);
}

But the hell, my output file looks like this:

       ¨ ּּּּּט        ¨ ּּּ¯        ¨ ּּּּּּּּ   ּּּ«
        ¨      

Which holds no plain text. I have tried changing the fopen argument from "wb" to "w" but I still get gibberish files. What am I doing wrong?

Quaker
  • 1,483
  • 3
  • 20
  • 36

1 Answers1

1

Here you write the pointer and additional garbage instead of the string it points to:

    fwrite(&currProdName, sizeof(char), currProdNameLen, pBinary);

You should use:

    fwrite(currProdName, sizeof(char), currProdNameLen, pBinary);

In your version you are passing the pointer to the pointer, but you want to pass the pointer itself.

BTW: In your function getProdName(), you should add an additional character, because you are allocating the exact string lenght, but no room for the 0 Byte at the end. This can also cause problems. Also fgets reads one char less. Check the man page for fgets. Instead of using fgets, you can also use fread because you know the length anyway. No additional parsing needed.

update

Change this:

    fscanf(pText, "%d", &currQuantity);

to

    fscanf(pText, "%d\n", &currQuantity);
Devolus
  • 21,661
  • 13
  • 66
  • 113
  • will using `fread` solve the problem with the space not allocated for '\0'? – Quaker Aug 18 '13 at 20:19
  • No it will read as much as you tell it. Sou you should either memset the allocated memory (just to be sure) or do fread and then put the zero byte at the end. In case of an error, fgets will leave the memory possibly untouched, so if you don't check this, your strlen and subsequent operations will also cause problems because of the missing 0 Byte. After you counted the length of the string, allocate +1 memory. – Devolus Aug 18 '13 at 20:23
  • You mean that `fgets` won't tell me I'm doing something wrong but will return false values? – Quaker Aug 18 '13 at 20:24
  • BTW: Should I use `fseek` to jump over `'\n'` at the end of each line in the textfile? – Quaker Aug 18 '13 at 20:26
  • I updated my answer, because fgets will read on char less, then you specify. I was thinking at some other problem and confused this. So if it works, you will miss one char of your product. If it fails, the memory stays untouched, and you might get garbage because you don't check the return value. – Devolus Aug 18 '13 at 20:28
  • You don't really need fseek for that. You can use the fscanf function with a proper format string, as you are using this anyaway. This you dont need to know the number of bytes it requires, as the standard function will do this for you. – Devolus Aug 18 '13 at 20:32
  • I'm confused, where should I use `fscanf` and where `fgets`? – Quaker Aug 18 '13 at 20:33
  • Sorry. In the string reading fgets is ok, but you should check the return value. In your main code, you are using fscanf to read the integers. There you can also read the newline, so you don't need to skip over the ' \n' with lseek. – Devolus Aug 18 '13 at 20:37
  • Actually by updating `getProdName()` to this: http://pastebin.com/U5nug1zN the application works flawlessly. I feel the thing I hate the most about coding, when something works and you don't know why or how. – Quaker Aug 18 '13 at 20:44
  • BTW: In `pBinary = fopen(strcat(fileName, ".bin"), "wb");` you are also potentially corrupting memory, if filename is noot big enough to be extended. Just a hint, because you don't show how you call `convertTextToBinary`, so if the caller doesn't provide enough memory it gets corrupted. – Devolus Aug 18 '13 at 20:46
  • I call it like this: `if (convertTextToBinary(argv[1]) != ERROR)...` with an argument as input – Quaker Aug 18 '13 at 20:47
  • I thought as much. :) Well, this means that the memory gets corrupted. In your small application, you may not notice this, and the program appears to work. However, you are copying four bytes after the provided memory, which invokes unedinfed behaviour. Which means it can work, or it can not, or it can create unexpected results. – Devolus Aug 18 '13 at 20:52
  • So I should allocate memory for the original filename + ".bin" right? I was wondering about the \n not the memory allocation :| – Quaker Aug 18 '13 at 20:56
  • Yes, and +1 for the 0 byte. So it is `p = malloc(strlen(filename)+strlen("bin")+1); strcpy(p, filename); strcat(p, "bin");` – Devolus Aug 18 '13 at 21:08
  • Works like magic. But the mystery with the `'\n'` is still unsolved ^_^ – Quaker Aug 18 '13 at 21:10
  • The thing is that it works like the original version with the `'\n'` in `scanf()` but I can't figure out how it chooses to jump over the newline. – Quaker Aug 18 '13 at 21:16
  • 1
    Read the manpage. It says that the scanf familly will ignore whitespaces and newlines. That's why you don't need to use fseek here. – Devolus Aug 18 '13 at 21:17