Pass 2 — An arbitrary number of comma-separated messages per line
The question and some of the commentary is ambiguous about the meaning of 'message'. One of the fields in the input is designated 'message' (MESSAGE
in the question, but there's no need to shout), but a sequence of 6 data fields (number, greeting, month, day, hour, minute) is also designated as a message. The Pass 2 rewrite of the Pass 1 code (below) designates the word such as Hi!
or Hello!
as the greeting. Since the sample data has not changed, I am assuming that the greeting portion is a single word with no spaces in it. As noted in the Pass 1 commentary, if the greeting can contain blanks, then life gets very messy.
It requires a little care in the format string, but the line shown in the question can be parsed with sscanf()
easily enough, as long as the greeting is a single 'word' — a sequence of non-blank characters.
This answer builds on the ideas in How to use sscanf()
in loops.
This revised code assumes that a line of input consists of:
- an initial
|
to mark the start of the line.
- a series of one or more comma-separated messages, each of which contains 6 fields (number, greeting, month, day, hour, minute).
- a final
|
to mark the end of the line.
The code uses structures, pointers, and POSIX getline()
.
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
enum { MAX_INFO = 64 };
struct info
{
int user_id;
char greeting[32];
char month[4];
int day;
int hour;
int minute;
};
static size_t parse_line(const char *buffer, size_t num_left, struct info *data);
static void print_info(size_t i, const struct info *data);
int main(void)
{
char *buffer = NULL;
size_t buflen = 0;
struct info data[MAX_INFO];
size_t num_left = MAX_INFO;
struct info *next_data = data;
int length;
while (num_left > 0 && (length = getline(&buffer, &buflen, stdin)) != -1)
{
assert(length > 0);
buffer[length - 1] = '\0';
printf("[%s]\n", buffer);
if (buffer[0] != '|')
break;
size_t num_read = parse_line(buffer + 1, num_left, next_data);
printf("Read: %zu\n", num_read);
if (num_read == 0)
break;
next_data += num_read;
num_left -= num_read;
}
free(buffer);
size_t num_used = MAX_INFO - num_left;
for (size_t i = 0; i < num_used; i++)
print_info(i, &data[i]);
return 0;
}
static size_t parse_line(const char *buffer, size_t num_left, struct info *data)
{
char marker;
int offset = 0;
size_t num_read = 0;
int rc;
while (num_read < num_left &&
(rc = sscanf(buffer, "%d : %31s %3s %d %d : %d %c%n",
&data->user_id, data->greeting, data->month, &data->day,
&data->hour, &data->minute, &marker, &offset)) == 7 &&
(marker == ',' || marker == '|'))
{
num_read++;
if (marker == '|')
break;
buffer += offset;
offset = 0;
data++;
}
if (rc != 7 || (marker != ',' && marker != '|'))
printf("rc = %d, marker = '%c'\n", rc, marker);
return num_read;
}
static void print_info(size_t i, const struct info *data)
{
printf("Info[%zu]: %d [%s] (%s %2d %.2d:%.2d)\n",
i, data->user_id, data->greeting, data->month, data->day, data->hour, data->minute);
}
As noted in the commentary on Pass 1, not all the spaces in the format string are necessary — those before a %d
or %s
conversion specification could be omitted. The code in the main program is responsible for skipping the initial |
— it would be possible and probably even sensible to move that checking into the parse_line()
function.
Data file:
|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|
| 666 : Beast Nov 20 13 : 17 , 617 : Squadron Jul 31 08 : 14 , 314159 : Pie Nov 26 00:00 |
|667: Beast Nov 21 14:18,618:Squadron Jul 22 9:15,314159:SweetiePie Nov 27 01:01|
|0:Short Dec 25 08:00|
Example output:
[|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|]
Read: 2
[| 666 : Beast Nov 20 13 : 17 , 617 : Squadron Jul 31 08 : 14 , 314159 : Pie Nov 26 00:00 |]
Read: 3
[|667: Beast Nov 21 14:18,618:Squadron Jul 22 9:15,314159:SweetiePie Nov 27 01:01|]
Read: 3
[|0:Short Dec 25 08:00|]
Read: 1
Info[0]: 999 [Hello!] (Jan 20 15:00)
Info[1]: 875 [Hi!] (Jan 20 17:00)
Info[2]: 666 [Beast] (Nov 20 13:17)
Info[3]: 617 [Squadron] (Jul 31 08:14)
Info[4]: 314159 [Pie] (Nov 26 00:00)
Info[5]: 667 [Beast] (Nov 21 14:18)
Info[6]: 618 [Squadron] (Jul 22 09:15)
Info[7]: 314159 [SweetiePie] (Nov 27 01:01)
Info[8]: 0 [Short] (Dec 25 08:00)
Pass 1 — Two messages per line
It requires a little care in the format string, but the line shown in the question can be parsed with sscanf()
easily enough as long as the 'message' never contains any blanks. (If the message can contain blanks, it is messy to parse regardless of whether you use sscanf()
or any other technology.)
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char data[] = "|999: Hello! Jan 20 15:00, 875: Hi! Jan 20 17:00|";
struct info
{
int user_id;
char message[32];
char month[4];
int day;
int hour;
int minute;
} info[2];
int rc;
if ((rc = sscanf(data, "|%d : %31s %3s %d %d : %d , %d : %31s %3s %d %d : %d |",
&info[0].user_id, info[0].message, info[0].month, &info[0].day, &info[0].hour, &info[0].minute,
&info[1].user_id, info[1].message, info[1].month, &info[1].day, &info[1].hour, &info[1].minute)) != 12)
{
fprintf(stderr, "Oops: sscanf() failed (rc = %d)!\n", rc);
exit(1);
}
printf("Info[%d]: %d [%s] (%s %2d %.2d:%.2d)\n",
0, info[0].user_id, info[0].message, info[0].month, info[0].day, info[0].hour, info[0].minute);
printf("Info[%d]: %d [%s] (%s %2d %.2d:%.2d)\n",
1, info[1].user_id, info[1].message, info[1].month, info[1].day, info[1].hour, info[1].minute);
return 0;
}
Output:
Info[0]: 999 [Hello!] (Jan 20 15:00)
Info[1]: 875 [Hi!] (Jan 20 17:00)
Not all the spaces in the format string are necessary — those before %d
or %s
conversion specifications could be omitted. You could add a space before the first |
, and the last space and last |
are also superfluous — you'll never know if they were matched. If you needed to find out, you could do so by adding %n
after the final |
and adding an extra argument &offset
(defined as int offset = 0;
) as the final argument to the call of sscanf()
. You'd need to check that offset
was not zero after the call; if it is still zero, the scanning failed on the last characters (there wasn't a |
after the last minute value).