1

I am working on an assignment that deals with reading data from a text file, and parsing that data to various arrays. For example, a portion of my text file looks as follows:

arbrick  pts/26       141.219.210.189  Thu Mar 29 11:23 - 11:24  (00:00)    
rjmcnama pts/27       141.219.205.107  Thu Mar 29 11:02   still logged in   
ajhoekst pts/26       99.156.215.40    Thu Mar 29 10:59 - 11:08  (00:08)    
eacarter pts/31       141.219.162.145  Thu Mar 29 10:50 - 10:51  (00:00)    
kmcolema pts/31       141.219.214.128  Thu Mar 29 09:44 - 09:47  (00:03) 

I need to parse the data into the following arrays: user id, terminal, ip address, and event times. How can I do this considering that there isn't a consistant amount of white space between the columns?

EDIT: I tried using the suggestion that Thiruvalluvar provided, but I just could not get it to work. However, I did switch to sscanf and that is working quite well almost...

while(!feof(myfile)) {
        fgets(buffer, 256, myfile);
        sscanf(buffer, "%s %s %s %s", user_id[i], terminal_id[i], ip_addr[i], events[i]);
    } /*End while not EOF*/

What is working, is the user_id, terminal_id, and ip_addr arrays. However, the events array isn't working perfectly as of yet. Since the events array is a string that contains white space, how can I use sscanf to add the remainder of the buffer to the events array?

Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
kubiej21
  • 700
  • 4
  • 14
  • 29
  • Where I'm surprised is all solutions seems to be using the string functions for searching the stream. While this works, from experience the fastest parsers uses `fgetc` iterating one character at a time. While this may seem counterintuitive at first you have to keep in mind libc will still reads page-sized blocks and your code ends up running very fast on the cpu (doing a thigh loop as opposed to doing function calls, mallocs, memmoves and others). I'm still curious of the benefits since for this you still end up allocating all to C arrays so maybe I'll have a run at it with time comparisons. – Thomas Guyot-Sionnest Feb 09 '22 at 14:16

4 Answers4

4

I think, the real part of the question is how to strore them in only 4 arrays. E.g.:

arbrick  pts/26       141.219.210.189  Thu Mar 29 11:23 - 11:24  (00:00)    

Tokenizing this line with whitespace is goin to give many strings. But we are only interested in splitting the entire line into only 4 lines, not more than that.

Solution:

  1. Read the line using fgets().

  2. Tokenize it using strtok() or strtok_r() (for thread-safe) with whitespace as delimiter.

  3. Read the 1st 3 strings into the arrays: user_id, terminal_id and ip_address

  4. Store ( and append) the rest of strings into the array events.

    int i = 0;    
    int line_index = 0;     
    char *p;    
    while(...) //loop to read the file
    {
        fgets(line);
        p = strtok(line, " ");
        i=0;
    
        while(p!=NULL)
        {
    
            if(i==0) strcpy(user_id[line_index], p);
    
            if(i==1) strcpy(terminal_id[line_index], p);
    
            if(i==2) strcpy(ip_addr[line_index], p);
    
            else     strcat(events[line_index], p); //anything else goes into array events
    
            i++;
    
        }
    
        line_index++;
    } //end of file-reading loop.
    
Declan Cook
  • 6,066
  • 2
  • 35
  • 52
P.P
  • 117,907
  • 20
  • 175
  • 238
  • Alright, I see where you were going with this, and I gave it a shot. However, we just covered pointers, and I don't think I am using them quite right. When the program gets to the line that reads as follows: strcpy(user_id[line_index], p); I get a seg fault. Would it make more sense to use a 2d char array to store the array of strings? – kubiej21 Mar 29 '12 at 20:34
  • It must to be 2D {array of strings} That's the reason for seg fault. char user_id[100][25]; //For 100 lines and make sure none of your user names exceed 25 chars length. Similarly no of users upto 100. Otherwise, change it accordingly or allocate it dynamically using malloc(). The same for other arrays as well. – P.P Mar 29 '12 at 20:41
2

Use fgets to read one line at a time. Operate on the line using sscanf calls to store the information, since the data is not in a consistent form (e.g., "still logged in"). sscanf will read and discard any whitespace between the format specifiers.

Matt Eckert
  • 1,946
  • 14
  • 16
0

Try this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char** split (char* string, const char* delim) {
char* p;
int i = 0;
char** array;

array = malloc(strlen(string) * sizeof (char*));
p = strtok (string, delim);
while (p != NULL) {
    array[i] = malloc(sizeof(char) );
    array[i++] = p;
    p = strtok(NULL, delim);
}
return array;
}

void parseLine(char *line, char *user, char term[], char ip[], char event[]) {
char *copy = line;
char **array = split(copy, " ");

strcpy(user, *array++);
strcpy(term, *array++);
strcpy(ip, *array++);
array++;array++;array++;
strcpy(event, *array++);
if (strcmp(*array, "-")) {
    strcat(event, " still logged in");
} else {
    array++;
    strcat(event, " - ");
    strcat(event, *array++);
}
}

int main(void) {

char line[2048];
char user[64], term[64], ip[64], event[64];

while (fgets(line, 2048, stdin) != NULL) {
    parseLine(line, user, term, ip, event);
    printf("[%s][%s][%s][%s]\n", user, term, ip, event);
    /* use an array to save them ... */
}
return 0;
}

and then: ./a.out < file.txt

Marcos
  • 4,643
  • 7
  • 33
  • 60
0

For what it is worth, here is my suggestion. Roll your own string-tokeniser as follows:

static char *string_tok(char **stringp, const char *delim)
{
    char *tok = *stringp + strspn(*stringp, delim);
    char *end = tok + strcspn(tok, delim);

    if (*end) {
        *end++ = '\0';
        end += strspn(end, delim);
    }
    *stringp = end;
    return tok;
}

Then just call it in sequence for each token. After the third call to string_tok the buffer buf holds a pointer to the start of the remainder of the string (the events). Note that buf must be writeable.

static void parse(char * buf)
{
    char * user_id = string_tok(&buf, " \t");
    char * term    = string_tok(&buf, " \t");
    char * ip      = string_tok(&buf, " \t");
    printf("user_id:  %s\n", user_id);
    printf("terminal: %s\n", term);
    printf("ip addr:  %s\n", ip);
    printf("events:   %s\n\n", buf);
}
William Morris
  • 3,554
  • 2
  • 23
  • 24