-3

This code reads a text file line by line. But I need to put those lines in an array but I wasn't able to do it. Now I am getting a array of numbers somehow. So how to read the file into a list. I tried using 2 dimensional list but this doesn't work as well.

I am new to C. I am mostly using Python but now I want to check if C is faster or not for a task.

#include <stdio.h>
#include <time.h>
#include <string.h>


void loadlist(char *ptext) {

   char filename[] = "Z://list.txt";
   char myline[200];
   FILE * pfile;
   pfile = fopen (filename, "r" );

   char larray[100000];
   int i = 0;
   while (!feof(pfile)) {
       fgets(myline,200,pfile);
       larray[i]= myline;
       //strcpy(larray[i],myline);
       i++;
       //printf(myline);

   }


   fclose(pfile);
   printf("%s \n %d \n  %d \n ","while doneqa",i,strlen(larray));   
   printf("First larray element is: %d \n",larray[0]);

    /* for loop execution */
   //for( i = 10; i < 20; i = i + 1 ){
   //   printf(larray[i]);
   //}


    }


int main ()
{
   time_t stime, etime;
   printf("Starting of the program...\n");
   time(&stime);

   char *ptext = "String";  
   loadlist(ptext);
   time(&etime);

   printf("time to load: %f \n", difftime(etime, stime));

   return(0);
}

This code reads a text file line by line. But I need to put those lines in an array but I wasn't able to do it. Now I am getting an array of numbers somehow.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Brana
  • 1,197
  • 3
  • 17
  • 38
  • 1
    Language is not js , please remove the run code snippet thing. – Suraj Jain Feb 03 '17 at 20:39
  • `while(feof())` is [wrong](http://stackoverflow.com/q/5431941/3185968) – EOF Feb 03 '17 at 20:40
  • 3
    You have multiple common mistakes like `while (!feof(pfile))` which will only be false after `fgets()` fails for the first time, and you don't check if `fopen()` succeeds. And the type of `larray` is wrong too, `char` is a single character it's an integral type and not a string. – Iharob Al Asimi Feb 03 '17 at 20:40
  • 1
    You are seeing numbers because you are using "%d" as the format specifier in printf. – David Choweller Feb 03 '17 at 20:41
  • 1
    You do not use the argument `ptext` passed to `void loadlist(char *ptext)` – Weather Vane Feb 03 '17 at 20:41
  • So how do I use while? – Brana Feb 03 '17 at 20:45
  • 1
    @Brana `while (fgets(myline, sizeof(myline), file) != NULL)`. – Iharob Al Asimi Feb 03 '17 at 20:45
  • 1
    `larray[i]= myline` (larray[i] is a character. Why are you assigning a string pointer to it?) – David Choweller Feb 03 '17 at 20:45
  • 1
    **note:** `sizeof(myline)` will only work if the *array* `myline` is declared within the local scope and not passed as a parameter to the function. – David C. Rankin Feb 03 '17 at 20:47
  • so what do I use for the size, when using while (fgets(myline, sizeof(myline), file) != NULL) I do not get the whole while just the half of it. – Brana Feb 03 '17 at 20:48
  • When I use larray[i]= myline and I declare char larray[10000][200] it doesn't work as well. And this should be a array long 100 000 with strings long 200 max, doesn't it. – Brana Feb 03 '17 at 20:49
  • 1
    @Brana it is much cleaner to simply declare **constants** for your array sizes using either `#define LNMAX 200` or by using a global `enum` to create them (e.g. `enum { LNMAX = 200, PTRMAX = 100000 };` Then `while (fgets (myline, LNMAX, file)) {...}` – David C. Rankin Feb 03 '17 at 20:51

2 Answers2

1

It's natural that you see numbers because you are printing a single character using the "%d" specifier. In fact, strings in c are pretty much that, arrays of numbers, those numbers are the ascii values of the corresponding characters. If you instead use "%c" you will see the character that represents each of those numbers.

Your code also, calls strlen() on something that is intended as a array of strings, strlen() is used to compute the length of a single string, a string being an array of char items with a non-zero value, ended with a 0. Thus, strlen() is surely causing undefined behavior.

Also, if you want to store each string, you need to copy the data like you tried in the commented line with strcpy() because the array you are using for reading lines is overwritten over and over in each iteration.

Your compiler must be throwing all kinds of warnings, if it's not then it's your fault, you should let the compiler know that you want it to do some diagnostics to help you find common problems like assigning a pointer to a char.

You should fix multiple problems in your code, here is a code that fixes most of them

void 
loadlist(const char *const filename) {

    char line[100];
    FILE *file;
    // We can only read 100 lines, of 
    // max 99 characters each
    char array[100][100];
    int size;

    size = 0;
    file = fopen (filename, "r" );
    if (file == NULL)
        return;
    while ((fgets(line, sizeof(line), file) != NULL) && (size < 100)) {
        strcpy(array[size++], line);
    }
    fclose(file);

    for (int i = 0 ; i < size ; ++i) {
        printf("array[%d] = %s", i + 1, array[i]);
    }
}

int 
main(void)
{
   time_t stime, etime;
   printf("Starting of the program...\n");
   time(&stime);

   loadlist("Z:\\list.txt");
   time(&etime);

   printf("Time to load: %f\n", difftime(etime, stime));

   return 0;
}

Just to prove how complicated it can be in c, check this out

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>

struct string_list {
    char **items;
    size_t size;
    size_t count;
};

void
string_list_print(struct string_list *list)
{
    // Simply iterate through the list and
    // print every item
    for (size_t i = 0 ; i < list->count ; ++i) {
        fprintf(stdout, "item[%zu] = %s\n", i + 1, list->items[i]);
    }
}

struct string_list *
string_list_create(size_t size)
{
    struct string_list *list;
    // Allocate space for the list object
    list = malloc(sizeof *list);
    if (list == NULL) // ALWAYS check this
        return NULL;
    // Allocate space for the items 
    // (starting with `size' items)
    list->items = malloc(size * sizeof *list->items);
    if (list->items != NULL) {
        // Update the list size because the allocation
        // succeeded
        list->size = size;
    } else {
        // Be optimistic, maybe realloc will work next time
        list->size = 0;
    }
    // Initialize the count to 0, because
    // the list is initially empty
    list->count = 0;

    return list;
}

int
string_list_append(struct string_list *list, const char *const string)
{
    // Check if there is room for the new item
    if (list->count + 1 >= list->size) {
        char **items;
        // Resize the array, there is no more room
        items = realloc(list->items, 2 * list->size * sizeof *list->items);
        if (items == NULL)
            return -1;
        // Now update the list
        list->items = items;
        list->size += list->size;
    }
    // Copy the string into the array we simultaneously
    // increase the `count' and copy the string
    list->items[list->count++] = strdup(string);
    return 0;
}

void
string_list_destroy(struct string_list *const list)
{
    // `free()' does work with a `NULL' argument
    // so perhaps as a principle we should too
    if (list == NULL)
        return;
    // If the `list->items' was initialized, attempt
    // to free every `strdup()'ed string
    if (list->items != NULL) {
        for (size_t i = 0 ; i < list->count ; ++i) {
            free(list->items[i]);
        }
        free(list->items);
    }
    free(list);
}

struct string_list *
loadlist(const char *const filename) {
    char line[100]; // A buffer for reading lines from the file
    FILE *file;
    struct string_list *list;
    // Create a new list, initially it has
    // room for 100 strings, but it grows
    // automatically if needed
    list = string_list_create(100);
    if (list == NULL)
        return NULL;
    // Attempt to open the file
    file = fopen (filename, "r");
    // On failure, we now have the responsibility
    // to cleanup the allocated space for the string
    // list
    if (file == NULL) {
        string_list_destroy(list);
        return NULL;
    }
    // Read lines from the file until there are no more
    while (fgets(line, sizeof(line), file) != NULL) {
        char *newline;
        // Remove the trainling '\n'
        newline = strchr(line, '\n');
        if (newline != NULL)
            *newline = '\0';
        // Append the string to the list
        string_list_append(list, line);
    }
    fclose(file);
    return list;
}

int 
main(void)
{
   time_t stime, etime;
   struct string_list *list;
   printf("Starting of the program...\n");
   time(&stime);

   list = loadlist("Z:\\list.txt");
   if (list != NULL) {
       string_list_print(list);
       string_list_destroy(list);
   }
   time(&etime);

   printf("Time to load: %f\n", difftime(etime, stime));

   return 0;
}

Now, this will work almost as the python code you say you wrote but it will certainly be faster, there is absolutely no doubt.

It is possible that an experimented python programmer can write a python program that runs faster than that of a non-experimented c programmer, learning c however is really good because you then understand how things work really, and you can then infer how a python feature is probably implemented, so understanding this can be very useful actually.

Although it's certainly way more complicated than doing the same in python, note that I wrote this in nearly 10min. So if you really know what you're doing and you really need it to be fast c is certainly an option, but you need to learn many concepts that are not clear to higher level languages programmers.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • The program prints the lines in text file, so could you let me know how to put these lines in an array. In python it would be larray.append(myline). It can't be this complicated, i spend 4 hours making this work. – Brana Feb 03 '17 at 20:54
  • 1
    @Brana They already are in the array `array`, you can't use it outside `loadlist()` though! For that you need to pass the array as a parameter. Oh, and it is this complicated. If you want your array to grow dynamically it's even worse. Any particular reason why you don't do it in python? – Iharob Al Asimi Feb 03 '17 at 20:58
  • I did of course in python and I use it, but I want to check if it is faster in C for the particular set of files. – Brana Feb 03 '17 at 21:10
  • Anyways here I am getting an error when I tried to read the whole file ... which has about 80 000 lines. Do you maybe know is the timing function correct as it always says the script is executed instantly but it first has to load a 10mb file so it cannot be done instantly? This is stime, etime. – Brana Feb 03 '17 at 21:13
  • 1
    If you really want the execution time in seconds, this is almost fine. For more granularity use [clock_gettime()](http://man7.org/linux/man-pages/man2/clock_gettime.2.html), I am assuming you are on linux or something similar. – Iharob Al Asimi Feb 03 '17 at 21:23
  • No i want it in miliseconds, now I am on windows :) So that is the problem. – Brana Feb 03 '17 at 21:25
  • 1
    I think there is no `clock_gettime()` on windows. You should google *function execution time in c Windows*. – Iharob Al Asimi Feb 03 '17 at 21:26
  • One more thing, when i use the first fuction and I put char array[2000][100]; and (size < 20000) it work ok. But when I put 30 000 more I get an error. Is there a size limit on arrays? – Brana Feb 03 '17 at 21:31
  • 1
    There is a size limit on the stack of the program, Solution: use `malloc()` or make the array global (*please don't do this unless you REALLY KNOW that you have to*). – Iharob Al Asimi Feb 03 '17 at 21:32
  • Actually I made it and while the python needed 1.66 sec for reading 100 files of 15k lines each, for C it took around 0.87 secs. It is fast, but I am not even sure it read all the lines properly. But since the whole script has quite a bit of calculation the difference could be even bigger. – Brana Feb 03 '17 at 22:45
  • Actually c is like 15X faster :) – Brana Feb 04 '17 at 17:26
1

There are many ways to do this correctly. To begin with, first sort out what it is you actually need/want to store, then figure out where that information will come from and finally decide how you will provide storage for the information. In your case loadlist is apparently intended load a list of lines (up to 10000) so that they are accessible through your statically declared array of pointers. (you can also allocate the pointers dynamically, but if you know you won't need more than X of them, statically declaring them is fine (up to the point you cause StackOverflow...)

Once you read the line in loadlist, then you need to provide adequate storage to hold the line (plus the nul-terminating character). Otherwise, you are just counting the number of lines. In your case, since you declare an array of pointers, you cannot simply copy the line you read because each of the pointers in your array does not yet point to any allocated block of memory. (you can't assign the address of the buffer you read the line into with fgets (buffer, size, FILE*) because (1) it is local to your loadlist function and it will go away when the function stack frame is destroyed on function return; and (2) obviously it gets overwritten with each call to fgets anyway.

So what to do? That's pretty simple too, just allocate storage for each line as it is read using the strlen of each line as @iharob says (+1 for the nul-byte) and then malloc to allocate a block of memory that size. You can then simply copy the read buffer to the block of memory created and assign the pointer to your list (e.g. larray[x] in your code). Now the gnu extensions provide a strdup function that both allocates and copies, but understand that is not part of the C99 standard so you can run into portability issues. (also note you can use memcpy if overlapping regions of memory are a concern, but we will ignore that for now since you are reading lines from a file)

What are the rules for allocating memory? Well, you allocate with malloc, calloc or realloc and then you VALIDATE that your call to those functions succeeded before proceeding or you have just entered the realm of undefined behavior by writing to areas of memory that are NOT in fact allocated for your use. What does that look like? If you have your array of pointers p and you want to store a string from your read buffer buf of length len at index idx, you could simply do:

    if ((p[idx] = malloc (len + 1)))    /* allocate storage */
        strcpy (p[idx], buf);           /* copy buf to storage */
    else
        return NULL;                    /* handle error condition */

Now you are free to allocate before you test as follows, but it is convenient to make the assignment as part of the test. The long form would be:

    p[idx] = malloc (len + 1);          /* allocate storage */

    if (p[idx] == NULL)  /* validate/handle error condition */
        return NULL;

    strcpy (p[idx], buf);           /* copy buf to storage */

How you want to do it is up to you.

Now you also need to protect against reading beyond the end of your pointer array. (you only have a fixed number since you declared the array statically). You can make that check part of your read loop very easily. If you have declared a constant for the number of pointers you have (e.g. PTRMAX), you can do:

int idx = 0;            /* index */

while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {
    ...
    idx++;
}

By checking the index against the number of pointers available, you insure you cannot attempt to assign address to more pointers than you have.

There is also the unaddressed issue of handling the '\n' that will be contained at the end of your read buffer. Recall, fgets read up to and including the '\n'. You do not want newline characters dangling off the ends of the strings you store, so you simply overwrite the '\n' with a nul-terminating character (e.g. simply decimal 0 or the equivalent nul-character '\0' -- your choice). You can make that a simple test after your strlen call, e.g.

while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {

    size_t len = strlen (buf);  /* get length */

    if (buf[len-1] == '\n') /* check for trailing '\n' */
        buf[--len] = 0;     /* overwrite '\n' with nul-byte */
    /* else { handle read of line longer than 200 chars }
     */
    ...

(note: that also brings up the issue of reading a line longer than the 200 characters you allocate for your read buffer. You check for whether a complete line has been read by checking whether fgets included the '\n' at the end, if it didn't, you know your next call to fgets will be reading again from the same line, unless EOF is encountered. In that case you would simply need to realloc your storage and append any additional characters to that same line -- that is left for future discussion)

If you put all the pieces together and choose a return type for loadlist that can indicate success/failure, you could do something similar to the following:

/** read up to PTRMAX lines from 'fp', allocate/save in 'p'.
 *  storage is allocated for each line read and pointer
 *  to allocated block is stored at 'p[x]'. (you should
 *  add handling of lines greater than LNMAX chars)
 */
char **loadlist (char **p, FILE *fp)
{
    int idx = 0;            /* index */
    char buf[LNMAX] = "";   /* read buf */

    while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {

        size_t len = strlen (buf);  /* get length */

        if (buf[len-1] == '\n') /* check for trailing '\n' */
            buf[--len] = 0;     /* overwrite '\n' with nul-byte */
        /* else { handle read of line longer than 200 chars }
         */

        if ((p[idx] = malloc (len + 1)))    /* allocate storage */
            strcpy (p[idx], buf);           /* copy buf to storage */
        else
            return NULL;    /* indicate error condition in return */

        idx++;
    }

    return p;   /* return pointer to list */
}

note: you could just as easily change the return type to int and return the number of lines read, or pass a pointer to int (or better yet size_t) as a parameter to make the number of lines stored available back in the calling function.

However, in this case, we have used the initialization of all pointers in your array of pointers to NULL, so back in the calling function we need only iterate over the pointer array until the first NULL is encountered in order to traverse our list of lines. Putting together a short example program that read/stores all lines (up to PTRMAX lines) from the filename given as the first argument to the program (or from stdin if no filename is given), you could do something similar to:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

enum { LNMAX = 200, PTRMAX = 10000 };

char **loadlist (char **p, FILE *fp);

int main (int argc, char **argv) {

    time_t stime, etime;
    char *list[PTRMAX] = { NULL };  /* array of ptrs initialized NULL */
    size_t n = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    printf ("Starting of the program...\n");

    time (&stime);
    if (loadlist (list, fp)) {   /* read lines from fp into list */
        time (&etime);
        printf("time to load: %f\n\n", difftime (etime, stime));
    }
    else {
        fprintf (stderr, "error: loadlist failed.\n");
        return 1;
    }

    if (fp != stdin) fclose (fp);     /* close file if not stdin */

    while (list[n]) {   /* output stored lines and free allocated mem */
        printf ("line[%5zu]: %s\n", n, list[n]);
        free (list[n++]);
    }

    return(0);
}


/** read up to PTRMAX lines from 'fp', allocate/save in 'p'.
 *  storage is allocated for each line read and pointer
 *  to allocated block is stored at 'p[x]'. (you should
 *  add handling of lines greater than LNMAX chars)
 */
char **loadlist (char **p, FILE *fp)
{
    int idx = 0;            /* index */
    char buf[LNMAX] = "";   /* read buf */

    while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {

        size_t len = strlen (buf);  /* get length */

        if (buf[len-1] == '\n') /* check for trailing '\n' */
            buf[--len] = 0;     /* overwrite '\n' with nul-byte */
        /* else { handle read of line longer than 200 chars }
         */

        if ((p[idx] = malloc (len + 1)))    /* allocate storage */
            strcpy (p[idx], buf);           /* copy buf to storage */
        else
            return NULL;    /* indicate error condition in return */

        idx++;
    }

    return p;   /* return pointer to list */
}

Finally, in any code your write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

Use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory, attempted to read or base a jump on an uninitialized value and finally to confirm that you have freed all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

Look things over, let me know if you have any further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thanks very much - I actually wrote this snippet just to check if it is worth the time to translate some parts of my python code to c and it turned out is it, as for 300 iterations it takes around 0.7 sec while in python it takes around 10 seconds. So i need the otput which likely must be a pointer, which should be compiled as dll and loaded into python using ctypes. I am not really sure if I can do it but it should work somehow.\ – Brana Feb 04 '17 at 02:19
  • 1
    Yes, that is no problem. Basically, you will compile your read/loadlist routine into a 'shared library' which you could then link with your python code. There are a number of good tutorials on creating a '*share library*' (which is basically just compiling with a couple of additional options) See: [**Shared Libraries with gcc**](http://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html) and [**Static and Shared Object Libraries**](http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html) – David C. Rankin Feb 04 '17 at 02:59
  • Thanks for the links - I run my python software on a server but I develop it on windows. So i first tried to create simple dll in windows (and letter i will create .so files on linux) - but they failed to work. – Brana Feb 04 '17 at 03:27
  • No matter what i couldn't create a dll that works on windows - please check out the questio - http://stackoverflow.com/questions/42036589/using-c-in-python-i-cannot-load-c-dll-using-ctypes when you get a chance, – Brana Feb 04 '17 at 05:07
  • 1
    Well, I hate to run out of good advise, but I have run Linux as my desktop since 2001... There is no reason a dll should not work. (after all it's just windows version of a shared library). The tutorials I provided were Linux based, but there should be plenty on the windoze side of the house. I'll let others comment on your second question -- I'm sure there is a simple solution, I just can't help you with windows specific issues. – David C. Rankin Feb 04 '17 at 06:48
  • Never mind afer reading maybe 10 different answer about the problem i found that the problem was that I didn't have c installed properly. The most obvios answer but not so common. Will put your advice into action on linux. – Brana Feb 04 '17 at 17:25
  • Are you sure there isn't \r\n on the linux ... when using python I had to replace \r\n at the end of each line? And is there a simple way to increase max possible array size as I need it to be around 100 000 lines long or even longer. – Brana Feb 04 '17 at 17:25
  • 1
    When working with DOS `'\r\n'` line-endings, just replace the `if` with `while`, e.g. `while (buf[len-1] == '\n' || buf[len-1] == '\r') buf[--len] = 0;` See: [**carriage return by fgets**](http://stackoverflow.com/questions/12769289/carriage-return-by-fgets) – David C. Rankin Feb 05 '17 at 04:26