Use the 64-bit unsigned integer type uint64_t
(declared in <inttypes.h>
) to store the 48-bit value (HH:HH:HH:HH:HH:HH → 0xHHHHHHHHHHHH).
You could use sscanf()
, but it does not detect overflow; it would consider only the two rightmost hexadecimal characters in each part, so F11:E22:D33:C44:B55:A66
would yield the same result as 11:22:33:44:55:66
.
First, we need a function to convert a hexadecimal digit to its numerical value. Here is the simplest, most easy to read, and also most portable way to write it:
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <ctype.h>
#include <stdio.h>
static inline int hex_digit(const int c)
{
switch (c) {
case '0': return 0;
case '1': return 1;
case '2': return 2;
case '3': return 3;
case '4': return 4;
case '5': return 5;
case '6': return 6;
case '7': return 7;
case '8': return 8;
case '9': return 9;
case 'A': case 'a': return 10;
case 'B': case 'b': return 11;
case 'C': case 'c': return 12;
case 'D': case 'd': return 13;
case 'E': case 'e': return 14;
case 'F': case 'f': return 15;
default: return -1;
}
}
The function will return a nonnegative (0 or positive) integer corresponding to the character, or -1 if the character is not a hexadecimal digit.
The static inline
means that the function is only visible in this translation unit (file; or if put in a header file, each file that #includes
that header file). It was standardized in C99 as a way for programmers to write functions that are as fast as (incur no runtime overhead compared to) preprocessor macros.
Next, we need a function to carefully parse the string. Here is one:
/* Parse a string "HH:HH:HH:HH:HH:HH" to 0x00HHHHHHHHHHHH,
and return a pointer to the character following it,
or NULL if an error occurs. */
static const char *parse_mac(const char *src, uint64_t *dst)
{
uint64_t value = 0;
int i, hi, lo;
/* No string specified? */
if (!src)
return NULL;
/* Skip leading whitespace. */
while (isspace((unsigned char)(*src)))
src++;
/* End of string? */
if (!*src)
return NULL;
/* First pair of hex digits. */
if ((hi = hex_digit(src[0])) < 0 ||
(lo = hex_digit(src[1])) < 0)
return NULL;
value = 16*hi + lo;
src += 2;
/* The next five ":HH" */
for (i = 0; i < 5; i++) {
if (src[0] != ':' || (hi = hex_digit(src[1])) < 0 ||
(lo = hex_digit(src[2])) < 0 )
return NULL;
value = 256*value + 16*hi + lo;
src += 3;
}
/* Successfully parsed. */
if (dst)
*dst = value;
return src;
}
Above, we marked the function static
, meaning it too is only visible in this compilation unit. It is not marked inline
, because it is not a trivial function; it does proper work, so we do not suggest the compiler should inline it.
Note the cast to unsigned char
in the isspace()
call. This is because isspace() takes either an unsigned char, or EOF. If we supply it a char, and char type happens to be a signed type (it varies between architectures), some characters do get incorrectly classified. So, using the cast with the character-type functions (isspace()
, isblank()
, tolower()
, `toupper(), et cetera) is important, if you want your code to work right on all systems that support standard C.
You might not be familiar with the idiom if ((variable = subexpression) < 0)
. For each (variable = subexpression) < 0
, the subexpression gets evaluated, then assigned to the variable. If the value is less than zero, the entire expression is true; otherwise it is false. The variable will retain its new value afterwards.
In C, logical AND (&&
) and OR (||
) are short-circuiting. This means that if you have A && B
, and A is false, then B is not evaluated at all. If you have A || B
, and A is true, then B is not evaluated at all. So, in the above code,
if ((hi = hex_digit(src[0])) < 0 ||
(lo = hex_digit(src[1])) < 0)
return NULL;
is exactly equivalent to
hi = hex_digit(src[0]);
if (hi < 0)
return NULL;
lo = hex_digit(src[1]);
if (lo < 0)
return NULL;
Here, we could have written those two complicated if
statements more verbosely, but I wanted to include it in this example, to make this answer into something you must "chew" a bit in your mind, before you can use it in e.g. homework.
The main "trick" in the function is that we build value
by shifting its digits leftward. If we are parsing 12:34:56:78:9A:BC
, the first assignment to value
is equivalent to value = 0x12;
. Multiplying value
by 256 shifts the hexadecimal digits by two places (because 256 = 0x100), so in the first iteration of the for loop, the assignment to value is equivalent to value = 0x1200 + 0x30 + 0x4;
i.e. value = 0x1234;
. This goes on for four more assignments, so that the final value is 0x123456789ABC;
. This "shifting digits via multiplication" is very common, and works in all numerical bases (for decimal numbers, the multiplier is a power of 10; for octal numbers, a power of 8; for hexadecimal numbers, a power of 16; always a power of the base).
You can, for example, use this approach to reverse the digits in a number (so that one function converts 0x123456
to 0x654321
, and another converts 8040201
to 1020408
).
To test the above, we need a main()
, of course. I like my example programs to tell me what they do if I run them without arguments. When they work on strings or numbers, I like to provide them on the command line, rather than having the program ask for input:
int main(int argc, char *argv[])
{
const char *end;
uint64_t mac;
int arg;
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s HH:HH:HH:HH:HH:HH ...\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program parses the hexadecimal string(s),\n");
fprintf(stderr, "and outputs them in both hexadecimal and decimal.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
end = parse_mac(argv[arg], &mac);
if (!end) {
fprintf(stderr, "Cannot parse '%s'.\n", argv[arg]);
return EXIT_FAILURE;
}
if (*end)
printf("%s: 0x%012" PRIx64 " = %" PRIu64 " in decimal; '%s' unparsed.\n",
argv[arg], mac, mac, end);
else
printf("%s: 0x%012" PRIx64 " = %" PRIu64 " in decimal.\n",
argv[arg], mac, mac);
fflush(stdout);
}
return EXIT_SUCCESS;
}
The first if clause checks if there are any command-line parameters. (argv[0]
is the program name itself, and is included in argc
, the number of strings in argv[]
array. In other words, argc == 1
means only the program name was supplied on the command line, argc == 2
means the program name and one parameter (in argv[1]
) was supplied, and so on.)
Because it is often nice to supply more than one item to work on, we have a for
loop over all command-line parameters; from argv[1]
to argv[argc-1]
, inclusive. (Remember, because argc is the number of strings in the argv[] array, and numbering starts from 0, the last is argc-1. This is important to remember in C, in all array use!)
Within the for loop, we use our parse function. Because it returns a pointer to the string following the part we parsed, and we store that to end
, (*end == '\0')
(which is equivalent to the shorter form (!*end)
is true if the string ended there. If (*end)
(equivalent to (*end != '\0')
) is true, then there are additional characters in the string following the parsed part.
To output any of the integer types specified in <inttypes.h>
, we must use preprocessor macros. For uint64_t
, we can use "%" PRIu64
to print one in decimal; or "%" PRIx64
to print one in hexadecimal. "%012" PRIu64
means "Print a 12-digit uint64_t, zero-padded (on the left)".
Remember that in C, string literals are concatenated; "a b"
, "a " "b"
, "a" " " "b"
are all equivalent. (So, the PRI?##
macros all expand to strings that specify the exact conversion type. They are macros, because they vary between systems. In 64-bit Windows PRIu64
is usually "llu"
, but in 64-bit Linux it is "lu"
.)
The fflush(stdout);
at the end should do nothing, because standard output is by default line buffered. However, because I explicitly want the C library to ensure the output is output to standard output before next loop iteration, I added it. It would matter if one changed standard output to fully buffered. As it is, it is an "insurance" (against oddly behaving C library implementations), and a reminder to us human programmers that the intent is to have the output flushed, not cached by the C library, at that point.
(Why do we want that? Because if an error occurs during the next iteration, and we print errors to standard error, and standard output and error are both usually directed to the terminal, we want the standard output to be visible before the standard error is, to avoid user confusion.)
If you compile the above to say example
(I use Linux, so I run it as ./example
; in Windows, you probably run it as example.exe
), you can expect the following outputs:
./example 12:34:56:07:08:09 00:00:00:00:00:00foo bad
12:34:56:07:08:09: 0x123456070809 = 20015990900745 in decimal.
00:00:00:00:00:00foo: 0x000000000000 = 0 in decimal; 'foo' unparsed.
Cannot parse 'bad'.
If you run it without parameters, or with just -h
or --help
, you should see
Usage: ./z [ -h | --help ]
./z HH:HH:HH:HH:HH:HH ...
This program parses the hexadecimal string(s),
and outputs them in both hexadecimal and decimal.
Obviously, there are other ways to achieve the same. If you are only interested in the string representation, you could use e.g.
#include <stdlib.h>
#include <ctype.h>
char *mac_to_hex(const char *src)
{
char *dst, *end;
int i;
if (!src)
return NULL;
/* Skip leading whitespace. */
while (isspace((unsigned char)(*src)))
src++;
/* The next two characters must be hex digits. */
if (!isxdigit((unsigned char)(src[0])) ||
!isxdigit((unsigned char)(src[1])))
return NULL;
/* Dynamically allocate memory for the result string.
"0x112233445566" + '\0' = 15 chars total. */
dst = malloc(15);
if (!dst)
return NULL;
/* Let end signify the position of the next char. */
end = dst;
/* Prefix, and the first two hex digits. */
*(end++) = '0';
*(end++) = 'x';
*(end++) = *(src++);
*(end++) = *(src++);
/* Loop over the five ":HH" parts left. */
for (i = 0; i < 5; i++) {
if (src[0] == ':' &&
isxdigit((unsigned char)(src[1])) &&
isxdigit((unsigned char)(src[2])) ) {
*(end++) = src[1];
*(end++) = src[2];
src += 3;
} else {
free(dst);
return NULL;
}
}
/* All strings need a terminating '\0' at end.
We allocated enough room for it too. */
*end = '\0';
/* Ignore trailing whitespace in source string. */
while (isspace((unsigned char)(*src)))
src++;
/* All of source string processed? */
if (*src) {
/* The source string contains more stuff; fail. */
free(dst);
return NULL;
}
/* Success! */
return dst;
}
I consider this approach much less useful, because the source string must contain exactly HH:HH:HH:HH:HH:HH (although leading and trailing whitespace is allowed). Parsing it to an unsigned integer lets you e.g. read a line, and parse all such patterns on it, with a simple loop.
If you find any bugs or issues in the above, let me know in a comment so I can verify and fix if necessary.