0

I have input from console in this exact format:

|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00, ...........|
  • The number of whitespaces doesn't matter ( |999: Hello!...| is the same as |999 : Hello! ...| )
  • I don't know how many messages there are "999: Hello! Jan 20 15:00" = one message

and I used getline() function to get this whole string as char array variable. The problem now is that I need to extract variables from this string, for example:

char USER_ID = 999;

char *MESSAGE = "Hello";

char MONTH[3] = "Jan";

int HOUR = 15;

int MIN = 0;

I want to add them to array, that's why I'm asking just about one of them.

I tried using sscanf, but it doesn't work. I even tried not using getline() in the first place and using scanf() insted, but to no avail. How can I do this in C? The only option I can think of is using for loops (run till encounter whitespace, assign, repeat), but that would be terribly slow.

Any ideas?

Etoile
  • 161
  • 10

2 Answers2

0

Pass 2 — An arbitrary number of comma-separated messages per line

The question and some of the commentary is ambiguous about the meaning of 'message'. One of the fields in the input is designated 'message' (MESSAGE in the question, but there's no need to shout), but a sequence of 6 data fields (number, greeting, month, day, hour, minute) is also designated as a message. The Pass 2 rewrite of the Pass 1 code (below) designates the word such as Hi! or Hello! as the greeting. Since the sample data has not changed, I am assuming that the greeting portion is a single word with no spaces in it. As noted in the Pass 1 commentary, if the greeting can contain blanks, then life gets very messy.

It requires a little care in the format string, but the line shown in the question can be parsed with sscanf() easily enough, as long as the greeting is a single 'word' — a sequence of non-blank characters.

This answer builds on the ideas in How to use sscanf() in loops.

This revised code assumes that a line of input consists of:

  • an initial | to mark the start of the line.
  • a series of one or more comma-separated messages, each of which contains 6 fields (number, greeting, month, day, hour, minute).
  • a final | to mark the end of the line.

The code uses structures, pointers, and POSIX getline().

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

enum { MAX_INFO = 64 };

struct info
{
    int     user_id;
    char    greeting[32];
    char    month[4];
    int     day;
    int     hour;
    int     minute;
};

static size_t parse_line(const char *buffer, size_t num_left, struct info *data);
static void print_info(size_t i, const struct info *data);

int main(void)
{

    char *buffer = NULL;
    size_t buflen = 0;
    struct info data[MAX_INFO];
    size_t num_left = MAX_INFO;
    struct info *next_data = data;
    int    length;

    while (num_left > 0 && (length = getline(&buffer, &buflen, stdin)) != -1)
    {
        assert(length > 0);
        buffer[length - 1] = '\0';
        printf("[%s]\n", buffer);
        if (buffer[0] != '|')
            break;
        size_t num_read = parse_line(buffer + 1, num_left, next_data);
        printf("Read: %zu\n", num_read);
        if (num_read == 0)
            break;
        next_data += num_read;
        num_left -= num_read;
    }
    free(buffer);

    size_t num_used = MAX_INFO - num_left;

    for (size_t i = 0; i < num_used; i++)
        print_info(i, &data[i]);

    return 0;
}

static size_t parse_line(const char *buffer, size_t num_left, struct info *data)
{
    char   marker;
    int    offset = 0;
    size_t num_read = 0;
    int    rc;

    while (num_read < num_left &&
           (rc = sscanf(buffer, "%d : %31s %3s %d %d : %d %c%n",
                        &data->user_id, data->greeting, data->month, &data->day,
                        &data->hour, &data->minute, &marker, &offset)) == 7 &&
           (marker == ',' || marker == '|'))
    {
        num_read++;
        if (marker == '|')
            break;
        buffer += offset;
        offset = 0;
        data++;
    }
    if (rc != 7 || (marker != ',' && marker != '|'))
        printf("rc = %d, marker = '%c'\n", rc, marker);
    return num_read;
}

static void print_info(size_t i, const struct info *data)
{
    printf("Info[%zu]: %d [%s] (%s %2d %.2d:%.2d)\n",
           i, data->user_id, data->greeting, data->month, data->day, data->hour, data->minute);
}

As noted in the commentary on Pass 1, not all the spaces in the format string are necessary — those before a %d or %s conversion specification could be omitted. The code in the main program is responsible for skipping the initial | — it would be possible and probably even sensible to move that checking into the parse_line() function.

Data file:

|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|
| 666  :   Beast    Nov  20  13 :  17   ,   617 :  Squadron Jul  31  08 : 14 ,  314159 :   Pie Nov 26 00:00 |
|667: Beast Nov 21 14:18,618:Squadron Jul 22 9:15,314159:SweetiePie Nov 27 01:01|
|0:Short Dec 25 08:00|

Example output:

[|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|]
Read: 2
[| 666  :   Beast    Nov  20  13 :  17   ,   617 :  Squadron Jul  31  08 : 14 ,  314159 :   Pie Nov 26 00:00 |]
Read: 3
[|667: Beast Nov 21 14:18,618:Squadron Jul 22 9:15,314159:SweetiePie Nov 27 01:01|]
Read: 3
[|0:Short Dec 25 08:00|]
Read: 1
Info[0]: 999 [Hello!] (Jan 20 15:00)
Info[1]: 875 [Hi!] (Jan 20 17:00)
Info[2]: 666 [Beast] (Nov 20 13:17)
Info[3]: 617 [Squadron] (Jul 31 08:14)
Info[4]: 314159 [Pie] (Nov 26 00:00)
Info[5]: 667 [Beast] (Nov 21 14:18)
Info[6]: 618 [Squadron] (Jul 22 09:15)
Info[7]: 314159 [SweetiePie] (Nov 27 01:01)
Info[8]: 0 [Short] (Dec 25 08:00)

Pass 1 — Two messages per line

It requires a little care in the format string, but the line shown in the question can be parsed with sscanf() easily enough as long as the 'message' never contains any blanks. (If the message can contain blanks, it is messy to parse regardless of whether you use sscanf() or any other technology.)

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char data[] = "|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|";

    struct info
    {
        int     user_id;
        char    message[32];
        char    month[4];
        int     day;
        int     hour;
        int     minute;
    } info[2];
    int rc;

    if ((rc = sscanf(data, "|%d : %31s %3s %d %d : %d , %d : %31s %3s %d %d : %d |",
                &info[0].user_id, info[0].message, info[0].month, &info[0].day, &info[0].hour, &info[0].minute,
                &info[1].user_id, info[1].message, info[1].month, &info[1].day, &info[1].hour, &info[1].minute)) != 12)
    {
        fprintf(stderr, "Oops: sscanf() failed (rc = %d)!\n", rc);
        exit(1);
    }

    printf("Info[%d]: %d [%s] (%s %2d %.2d:%.2d)\n",
           0, info[0].user_id, info[0].message, info[0].month, info[0].day, info[0].hour, info[0].minute);
    printf("Info[%d]: %d [%s] (%s %2d %.2d:%.2d)\n",
           1, info[1].user_id, info[1].message, info[1].month, info[1].day, info[1].hour, info[1].minute);
    return 0;
}

Output:

Info[0]: 999 [Hello!] (Jan 20 15:00)
Info[1]: 875 [Hi!] (Jan 20 17:00)

Not all the spaces in the format string are necessary — those before %d or %s conversion specifications could be omitted. You could add a space before the first |, and the last space and last | are also superfluous — you'll never know if they were matched. If you needed to find out, you could do so by adding %n after the final | and adding an extra argument &offset (defined as int offset = 0;) as the final argument to the call of sscanf(). You'd need to check that offset was not zero after the call; if it is still zero, the scanning failed on the last characters (there wasn't a | after the last minute value).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Yes, I know how to use structures :) your solution is one of those I tried, but the problem is that I don't know how many "messages" there will be in the input – Etoile Nov 27 '20 at 17:57
  • 2
    You should have shown what you tried and stated that your problem was that there could be multiple words in the message, and illustrated it in your sample data. – Jonathan Leffler Nov 27 '20 at 17:59
  • Sorry, I will clarify the question! – Etoile Nov 27 '20 at 18:02
  • Answer revised with 'Pass 2'. You didn't show any multi-word greeting in the data, so this code doesn't attempt to deal with multi-word greetings. – Jonathan Leffler Nov 27 '20 at 19:10
0

Assuming your input format is consistent and well-behaved, you could do something like this:

#include <stdio.h>
#include <stdlib.h>

#define MESSAGE_LENGTH 20 // maximum message length, however long that needs to be

#define EXPAND(x) #x
#define STRINGIFY(x) EXPAND(x) 
#define MSG_FMT STRINGIFY(MESSAGE_LENGTH) // I'll explain this below

int main( void )
{
  int user_id, day, hour, minute;
  char message[MESSAGE_LENGTH+1], month[4];
  char c, p;

  /**
   * If the next non-whitespace character is a '|', start processing the record
   */
  if ( scanf( " %c", &c ) == 1 || c == '|' )
  {
    /**
     * Process multiple records per line, if necessary
     */
    do 
    {
      /**
       * Read the user id, message, and date.  I'll explain the mechanics
       * of how the message is being read below.  We expect 8 items to
       * be read and assigned.
       */
      if ( scanf( "%d: %" MSG_FMT "[^.,?!:;]%c %3s %d %d:%d %c", &user_id, message, &p, month, &day, &hour, &minute, &c ) == 8 )
      {
        /**
         * Process good input - in this case, we just parrot it back.
         */
        printf( "user_id = %d\n", user_id );
        printf( "date = %s %02d %02d:%02d\n", month, day, hour, minute );
        printf( "message = %s%c\n", message, p );
      }
      else
      {
        /**
         * We didn't get 8 good inputs on the last line.  Handle it
         * by not handling it and exiting.
         */
        fprintf( stderr, "Bad input, bailing out immediately\n" );
        return EXIT_FAILURE;
      }
      /**
       * Keep processing records as long as the last character in the
       * record is a ','
       */
    } while ( c == ',' );
  }

  /**
   * At this point, the last character we should have read is a '|'; if
   * that's not the case, then the last input was bad.
   */
  if ( c == '|' )
  {
    puts( "All records read successfully" );
  }
  else
  {
    puts( "Saw bad delimiter at the end of the last record" );
  }

  return 0;
}

If your message can be more than one word, like

|888: This is a test. Nov 27 12:20 |

then you can't use %s to read it - it will stop on the first whitespace character. Instead, we use the %[ conversion specifier, but this introduces a new problem - we need a way to indicate the end of the message so that it doesn't try to read the date as well. We need some kind of delimiter between the message and the date, so we're going to use the set of punctuation characters .,?!:;. In this case we're going to store that punctuation character to p so we can reproduce the message actually read, but if you decide to use a different delimiter that you don't want to save (say :), you can consume it without assigning it as follows:

 %[^:]%*c

That * in %*c tells scanf to read the next character and discard it. You'll need to change the test from == 8 to == 7 in that case.

We also read the first non-whitespace character following the record; if it's a ,, we read the next record, else we stop. If it's a | character, then we know we read everything correctly.

The %s and %[ conversion specifiers suffer from the same security flaw that gets did and that strcpy and strcat still do - they can't know how big the target buffer is based on the address alone. We have to specify a maximum field width so that that don't read more characters than their respective buffers are sized to hold.

Unfortunately, unlike printf, you can't specify the buffer size as an argument - it has to be hardcoded into the conversion specifier. With the month that's easy, we know it's sized to hold a string up to 3 characters long (plus the string terminator), so we just specify %3s.

Since the size of the message buffer is based on a preprocessor macro, we use some additional preprocessor magic to add it to the conversion specifier. MSG_FMT expands to STRINGIFY(MESSAGE_LENGTH), which expands to EXPAND(20), which expands to "20". That string is concantenated into the format, so it ultimately reads %20[^.,?!:;].

Just know that if your message string is too long to store, then that will affect how the date, hour, and minute are read, and you'll still most likely have a bad input error, it just won't clobber other memory in the process.

John Bode
  • 119,563
  • 19
  • 122
  • 198