-1

I am trying to input a text file, in a similar format to a CSV, into a multi-dimensional array, where every element of the array is an array of words for each line. Any help would be much appreciated!

For example, the file input.txt could contain:

Carrot, Potato, Beetroot, Courgette, Broccoli
Dad's oranges, Apple, Banana, Cherry
Pasta, Pizza, Bread, Butter

The structure of the outputted array I am hoping to get from that would be in the form:

[[Carrot, Potato, Beetroot, Courgette, Broccoli], [Dad's oranges, Apple, Banana, Cherry], [Pasta, Pizza, Bread, Butter]]

So you the line:

printf("%s", inputArray[1][0]);

Would print:

Dad's oranges
hardanger
  • 2,349
  • 1
  • 16
  • 23
  • I would suggest the delimiter string for `strtok` should be `", \t\r\n"` but "Dad's oranges" contains a space. So you'll have to leave that space out of the delimiters and strip off any leading space after getting each token. – Weather Vane Feb 16 '17 at 19:59
  • What output you are observing? What output did you expect? – rootkea Feb 16 '17 at 20:58
  • @rootkea Currently I am observing odd output that I cannot explain. For ``printf("%s", inputArray[0]);`` I get back ``Pasta``. I would expect the first item, ``Carrot``. – hardanger Feb 16 '17 at 21:08
  • @hardanger Yup! That's because you have not allocated the memory to store the previous tokens. But since `inputArray[0]` points to `line[0]` it now contains `Pasta` which overwrited th earlier `Carrot` and `Dad's oranges` See my answer for more details. – rootkea Feb 16 '17 at 21:24
  • You used `==` instead of `=` between `ptr` and `malloc`. See updated code in my answer. – rootkea Feb 16 '17 at 23:31
  • @hardanger Also since you want the words to be encoded in 2D array, what are the dimenions of array? i.e. number of comma separated words in a single line? Also what if a line has 4 and another line has just 3 words? Or is it guaranteed that every line will have same number of words? – rootkea Feb 16 '17 at 23:36
  • @rookea - thank you for your reply. I did try just = but it doesn't work - surely it should be == as how can an assignment make sense if an if statement? – hardanger Feb 16 '17 at 23:39
  • @rookea - yes that would cause an issue as they are not necessarily the same length - have you thought of a better way than using a 2D array then? Many thanks – hardanger Feb 16 '17 at 23:40
  • @hardanger It should be `=`. And yes we can use another representation but from what you have posted it seems that yu want those words to access using 2D array. e.g. "`printf("%s", inputArray[1][0]);`" – rootkea Feb 16 '17 at 23:48
  • @rootkea - oh, as I just get exit code 11 when I use a single =. Please could you suggest an alternative to 2D arrays if this method is not appropriate please? – hardanger Feb 16 '17 at 23:50
  • @hardanger As I said, it's `=` and NOT `==`. You got segFault because of `printf("%s", inputArray[1][0]);` See my updated answer for working code. – rootkea Feb 17 '17 at 10:52

2 Answers2

0

I am not sure what the question here is. However, looking at your problem statement, and code I see few issues (note that I did not run the code, it is meant to give you an idea):

  1. Your varCount will start from 1, as you increment before you put the first word.
  2. You are storing an inherently 2-dimensional data into a single dimensional array. That is normally fine, but you need to encode where a line starts, and where it ends. That is missing. If all works, you will get an array of words, with no knowledge of where lines start/end. One way to deal with it is to create a 2D array. Another, is to insert a pointer to known word between the lines. Below is a code snippet that shows inserting a separator

    char *knownWord = "anyword";
    ...
    while (fgets(line, maxLineLength, inputFile))
    {
        token = strtok(&line[0], ",");
        while (token) {
            inputArray[varCount] = token;
            varCount++;
            token = strtok(NULL, ",");
        }
        inputArray[varCount] = knownWord;
        varCount ++;
    }
    

For this the print will happen with something like

    bool atKnownWord = 0;
    printf("[");
    for (i = 0; i < maxWords; i++) {
            if (inputArray[i] == NULL) {
                    break;
            }
            if (inputArray[i] == knownWord) {
                    atKnownWord = 1;
                    printf("]");
                    continue;
            }
            if (atKnownWord) {
                    atKnownWord = 0;
                    printf(", [");
            }
            printf("%s", inputArray[i]);
    }
    printf("]");
Virtually Real
  • 1,572
  • 4
  • 16
  • 20
  • Many thanks @virtually-real I am closer after putting that in but still not quite right. (I am not trying to print out a representation of an array with brackets, but your code still helps). For the sake of argument, printing out as you suggest I get ``[PastaPizzareader`` ``], [PastaBreadButter Cherry`` ``], [Pasta Pizza Bread Butter]]``. Any thoughts please? – hardanger Feb 16 '17 at 21:12
  • @hardanger replace "printf("%s", inputArray[i]);" with "printf("%s ", inputArray[i]);". Note the space after %s – Virtually Real Feb 16 '17 at 21:46
  • many thanks, but I think the issue goes a little deeper, I am weirdly getting Cheery for example output in the wrong line, this is what I get now: ``[Pasta Pizza read er`` -NEWLINE- ``], [Pasta Bread Butter Cherry`` -NEWLINE- ``], [Pasta Pizza Bread Butter ]]`` – hardanger Feb 16 '17 at 22:02
  • @hardanger Your issue is multiple. 1. You need to remove the '\n' character. Check http://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input. 2. strtok() operates on the original buffer thereby changing its original input. I suggest you use either use strcpy() or strdup() to duplicate the "line". Of course then you need to free the newly created buffers, but that is a separate topic. – Virtually Real Feb 16 '17 at 22:35
  • Thanks @virtually-real - though surely that change won't affect the more wide-ranging issue that the output text is not right, for example as in my question, the output should begin `[[Carrot, Potato, Beetroot,...` and instead starts `[Pasta Pizza read er`. Any thoughts please? – hardanger Feb 16 '17 at 22:45
  • @hardanger I explained it above. The memory allocated for line is being used and reused over and over. The only way to avoid it is to duplicate your memory. Here is something for you to try: declare a new char *tmp. Then tmp = strdup(line), and then call strtok on tmp instead of line. This will not solve your newline issue, but will solve meaningless content issue. – Virtually Real Feb 16 '17 at 22:51
0

You have not allocated the memory to store the token.
Also you should increment varCount after storing the token in an array.

The code can be written as:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define maxLineLength 1000  //Maximum length of a line
#define wordsPerLine 200    //Maximum words in a line
#define maxLines 200        //Maximum lines in an input

int main(int argc, char *argv[])
{
    char line[maxLineLength] = {0};
    char *inputArray[maxLines][wordsPerLine] = {};
    char *ptr, *token;
    int i, j, lines, maxWords = 0;

    FILE *inputFile = fopen("input.txt", "r");

    if (inputFile)
    {
        i = j = 0;
        while (fgets(line, maxLineLength, inputFile))
        {
            token = strtok(&line[0], ",\n");
            while(token)
            {
                if(ptr = malloc(sizeof(char) * (strlen(token)+1))) //whether malloc succeeded
                {
                    if(token[0] == ' ')
                        strcpy(ptr, token+1);
                    else
                        strcpy(ptr, token);

                    inputArray[i][j++] = ptr;
                    token = strtok(NULL, ",\n");
                }
                else
                {
                    printf("malloc failed!\n");
                    exit(1);
                }
            }
            if(maxWords < j)
                maxWords = j;
            i++;
            j = 0;          
        } 
        lines = i;
        fclose(inputFile);

        for(i = 0; i < lines; i++)
        {
            for(j = 0; (j < maxWords) && inputArray[i][j]; j++)
                printf("%s | ", inputArray[i][j]);
            printf("\n");
        }
    }
    return 0;
}  
rootkea
  • 1,474
  • 2
  • 12
  • 32
  • Hi @rootkea - many thanks for your answer. I have attempted to implement this but just get your fullback malloc failed! printout. Any thoughts on where I might have gone wrong please? – hardanger Feb 16 '17 at 22:40
  • @hardanger Can you please update the question with the code you tried? – rootkea Feb 16 '17 at 22:54