How to convert a string say of type AA:BB:CC:DD:EE:FF to 0xaabbccddeeff in C?

Question

Input: AA:BB:CC:DD:EE:FF Output expected: 0xaabbccddeeff. Input: AA:BB:65:F0:E4:D4 Output expected:0xaabb65f0e4d4

      char arr[20]="AA:BB:CC:DD:EE:FF";
      char t[20]="0x";   
      char *token=strtok(arr[i], ":");
      while(token !=NULL){
      printf("%s\n", token);
      token = strtok(NULL, ":");
      strcat(t, token);
        }
printf("The modified string is %s\n", t);

I am seeing a segmentation fault.

Please **edit your question** and include all the code you're executing - for example, `arr` is not defined in the code above so it's obvious there is more code - , the input data, the output from your program, and the text of whatever errors you're seeing. Thanks. — Bob Jarvis - Слава Україні, Jul 27 '18 at 23:44
[What does your step debugger tell you?](http://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems). Your question can be answered very quickly and easily with your step-debugger. You should always try and solve your problems with a step debugger before coming to StackOverflow. — , Jul 30 '18 at 17:39

ggorlen · Answer 1 · 2018-07-30T17:35:21.643

1

You're attempting the final strcat with a null token. Try moving your conditional to check for that before making the strcat call:

#include <ctype.h>
#include <stdio.h>
#include <string.h>

void lower(char *c) {
    for (; *c = tolower(*c); *c++);
}

int main() {
    char s[] = "AA:BB:CC:DD:EE:FF";
    char t[15] = "0x";
    char *token = strtok(s, ":");

    if (token) {
        lower(token);
        strcat(t, token);

        while (token = strtok(NULL, ":")) {
            lower(token);
            strcat(t, token);
        }
    }

    printf("The modified string is %s\n", t);
}

Output:

The modified string is 0xaabbccddeeff

edited Jul 30 '18 at 17:35

answered Jul 28 '18 at 00:17

ggorlen

44,755
7
76
106

Not sure about this one: will t[] automatically grow? – cup Jul 28 '18 at 04:17
@cup you could maybe use `char t[strlen(s)];` to ensure that `t` has more than enough memory space. or alternatively `char * t; t = (char) * malloc(strlen(s));`. – christopher westburry Jul 28 '18 at 04:37
Ran fine on gcc but feel free to malloc or declare a larger buffer. Let me know if you can find a definite example of it failing and I'll update. – ggorlen Jul 28 '18 at 04:40
That would have to be strlen(s) + 3 - for the 0x and the \0. That is what I would have done but the code in the answer is not doing that so I'm wondering if it is a new C thing I'm not aware of that the arrays grow dynamically. In the past, you'd get a segv, stack or framing error – cup Jul 28 '18 at 04:43
I'm updating anyway, missed a token, so I'll include a large enough buffer. Thanks. – ggorlen Jul 28 '18 at 04:45
Hi, Thank you for the code. I actually expect 0xaabbccddeeff as the output. Thanks a lot – user3800888 Jul 30 '18 at 14:35
You can use `tolower()` for lowercasing. Let me know if you want an update. – ggorlen Jul 30 '18 at 16:39

score 0 · Answer 2 · answered Jul 28 '18 at 03:51

Use the 64-bit unsigned integer type uint64_t (declared in <inttypes.h>) to store the 48-bit value (HH:HH:HH:HH:HH:HH → 0xHHHHHHHHHHHH).

You could use sscanf(), but it does not detect overflow; it would consider only the two rightmost hexadecimal characters in each part, so F11:E22:D33:C44:B55:A66 would yield the same result as 11:22:33:44:55:66.

First, we need a function to convert a hexadecimal digit to its numerical value. Here is the simplest, most easy to read, and also most portable way to write it:

#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <ctype.h>
#include <stdio.h>

static inline int  hex_digit(const int  c)
{
    switch (c) {
    case '0':           return  0;
    case '1':           return  1;
    case '2':           return  2;
    case '3':           return  3;
    case '4':           return  4;
    case '5':           return  5;
    case '6':           return  6;
    case '7':           return  7;
    case '8':           return  8;
    case '9':           return  9;
    case 'A': case 'a': return 10;
    case 'B': case 'b': return 11;
    case 'C': case 'c': return 12;
    case 'D': case 'd': return 13;
    case 'E': case 'e': return 14;
    case 'F': case 'f': return 15;
    default:            return -1;
    }
}

The function will return a nonnegative (0 or positive) integer corresponding to the character, or -1 if the character is not a hexadecimal digit.

The static inline means that the function is only visible in this translation unit (file; or if put in a header file, each file that #includes that header file). It was standardized in C99 as a way for programmers to write functions that are as fast as (incur no runtime overhead compared to) preprocessor macros.

Next, we need a function to carefully parse the string. Here is one:

/* Parse a string "HH:HH:HH:HH:HH:HH" to 0x00HHHHHHHHHHHH,
   and return a pointer to the character following it,
   or NULL if an error occurs. */
static const char *parse_mac(const char *src, uint64_t *dst)
{
    uint64_t  value = 0;
    int       i, hi, lo;

    /* No string specified? */
    if (!src)
        return NULL;

    /* Skip leading whitespace. */
    while (isspace((unsigned char)(*src)))
        src++;

    /* End of string? */
    if (!*src)
        return NULL;

    /* First pair of hex digits. */
    if ((hi = hex_digit(src[0])) < 0 ||
        (lo = hex_digit(src[1])) < 0)
        return NULL;

    value = 16*hi + lo;
    src += 2;

    /* The next five ":HH" */
    for (i = 0; i < 5; i++) {
        if (src[0] != ':' || (hi = hex_digit(src[1])) < 0 ||
                             (lo = hex_digit(src[2])) < 0 )
            return NULL;

        value = 256*value + 16*hi + lo;
        src += 3;
    }

    /* Successfully parsed. */
    if (dst)
        *dst = value;

    return src;
}

Above, we marked the function static, meaning it too is only visible in this compilation unit. It is not marked inline, because it is not a trivial function; it does proper work, so we do not suggest the compiler should inline it.

Note the cast to unsigned char in the isspace() call. This is because isspace() takes either an unsigned char, or EOF. If we supply it a char, and char type happens to be a signed type (it varies between architectures), some characters do get incorrectly classified. So, using the cast with the character-type functions (isspace(), isblank(), tolower(), `toupper(), et cetera) is important, if you want your code to work right on all systems that support standard C.

You might not be familiar with the idiom if ((variable = subexpression) < 0). For each (variable = subexpression) < 0, the subexpression gets evaluated, then assigned to the variable. If the value is less than zero, the entire expression is true; otherwise it is false. The variable will retain its new value afterwards.

In C, logical AND (&&) and OR (||) are short-circuiting. This means that if you have A && B, and A is false, then B is not evaluated at all. If you have A || B, and A is true, then B is not evaluated at all. So, in the above code,

if ((hi = hex_digit(src[0])) < 0 ||
    (lo = hex_digit(src[1])) < 0)
    return NULL;

is exactly equivalent to

hi = hex_digit(src[0]);
if (hi < 0)
    return NULL;
lo = hex_digit(src[1]);
if (lo < 0)
    return NULL;

Here, we could have written those two complicated if statements more verbosely, but I wanted to include it in this example, to make this answer into something you must "chew" a bit in your mind, before you can use it in e.g. homework.

The main "trick" in the function is that we build value by shifting its digits leftward. If we are parsing 12:34:56:78:9A:BC, the first assignment to value is equivalent to value = 0x12;. Multiplying value by 256 shifts the hexadecimal digits by two places (because 256 = 0x100), so in the first iteration of the for loop, the assignment to value is equivalent to value = 0x1200 + 0x30 + 0x4; i.e. value = 0x1234;. This goes on for four more assignments, so that the final value is 0x123456789ABC;. This "shifting digits via multiplication" is very common, and works in all numerical bases (for decimal numbers, the multiplier is a power of 10; for octal numbers, a power of 8; for hexadecimal numbers, a power of 16; always a power of the base).

You can, for example, use this approach to reverse the digits in a number (so that one function converts 0x123456 to 0x654321, and another converts 8040201 to 1020408).

To test the above, we need a main(), of course. I like my example programs to tell me what they do if I run them without arguments. When they work on strings or numbers, I like to provide them on the command line, rather than having the program ask for input:

int main(int argc, char *argv[])
{
    const char *end;
    uint64_t    mac;
    int         arg;

    if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s HH:HH:HH:HH:HH:HH ...\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program parses the hexadecimal string(s),\n");
        fprintf(stderr, "and outputs them in both hexadecimal and decimal.\n");
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    for (arg = 1; arg < argc; arg++) {
        end = parse_mac(argv[arg], &mac);
        if (!end) {
            fprintf(stderr, "Cannot parse '%s'.\n", argv[arg]);
            return EXIT_FAILURE;
        }

        if (*end)
            printf("%s: 0x%012" PRIx64 " = %" PRIu64 " in decimal; '%s' unparsed.\n",
                   argv[arg], mac, mac, end);
        else
            printf("%s: 0x%012" PRIx64 " = %" PRIu64 " in decimal.\n",
                   argv[arg], mac, mac);

        fflush(stdout);
    }

    return EXIT_SUCCESS;
}

The first if clause checks if there are any command-line parameters. (argv[0] is the program name itself, and is included in argc, the number of strings in argv[] array. In other words, argc == 1 means only the program name was supplied on the command line, argc == 2 means the program name and one parameter (in argv[1]) was supplied, and so on.)

Because it is often nice to supply more than one item to work on, we have a for loop over all command-line parameters; from argv[1] to argv[argc-1], inclusive. (Remember, because argc is the number of strings in the argv[] array, and numbering starts from 0, the last is argc-1. This is important to remember in C, in all array use!)

Within the for loop, we use our parse function. Because it returns a pointer to the string following the part we parsed, and we store that to end, (*end == '\0') (which is equivalent to the shorter form (!*end) is true if the string ended there. If (*end) (equivalent to (*end != '\0')) is true, then there are additional characters in the string following the parsed part.

To output any of the integer types specified in <inttypes.h>, we must use preprocessor macros. For uint64_t, we can use "%" PRIu64 to print one in decimal; or "%" PRIx64 to print one in hexadecimal. "%012" PRIu64 means "Print a 12-digit uint64_t, zero-padded (on the left)".

Remember that in C, string literals are concatenated; "a b", "a " "b", "a" " " "b" are all equivalent. (So, the PRI?## macros all expand to strings that specify the exact conversion type. They are macros, because they vary between systems. In 64-bit Windows PRIu64 is usually "llu", but in 64-bit Linux it is "lu".)

The fflush(stdout); at the end should do nothing, because standard output is by default line buffered. However, because I explicitly want the C library to ensure the output is output to standard output before next loop iteration, I added it. It would matter if one changed standard output to fully buffered. As it is, it is an "insurance" (against oddly behaving C library implementations), and a reminder to us human programmers that the intent is to have the output flushed, not cached by the C library, at that point.

(Why do we want that? Because if an error occurs during the next iteration, and we print errors to standard error, and standard output and error are both usually directed to the terminal, we want the standard output to be visible before the standard error is, to avoid user confusion.)

If you compile the above to say example (I use Linux, so I run it as ./example; in Windows, you probably run it as example.exe), you can expect the following outputs:

./example 12:34:56:07:08:09 00:00:00:00:00:00foo bad
12:34:56:07:08:09: 0x123456070809 = 20015990900745 in decimal.
00:00:00:00:00:00foo: 0x000000000000 = 0 in decimal; 'foo' unparsed.
Cannot parse 'bad'.

If you run it without parameters, or with just -h or --help, you should see

Usage: ./z [ -h | --help ]
       ./z HH:HH:HH:HH:HH:HH ...

This program parses the hexadecimal string(s),
and outputs them in both hexadecimal and decimal.

Obviously, there are other ways to achieve the same. If you are only interested in the string representation, you could use e.g.

#include <stdlib.h>
#include <ctype.h>

char *mac_to_hex(const char *src)
{
    char         *dst, *end;
    int           i;

    if (!src)
        return NULL;

    /* Skip leading whitespace. */
    while (isspace((unsigned char)(*src)))
        src++;

    /* The next two characters must be hex digits. */
    if (!isxdigit((unsigned char)(src[0])) ||
        !isxdigit((unsigned char)(src[1])))
        return NULL;

    /* Dynamically allocate memory for the result string.
       "0x112233445566" + '\0' = 15 chars total. */
    dst = malloc(15);
    if (!dst)
        return NULL;

    /* Let end signify the position of the next char. */
    end = dst;

    /* Prefix, and the first two hex digits. */
    *(end++) = '0';
    *(end++) = 'x';
    *(end++) = *(src++);
    *(end++) = *(src++);

    /* Loop over the five ":HH" parts left. */
    for (i = 0; i < 5; i++) {
        if (src[0] == ':' &&
            isxdigit((unsigned char)(src[1])) &&
            isxdigit((unsigned char)(src[2])) ) {
            *(end++) = src[1];
            *(end++) = src[2];
            src += 3;
        } else {
            free(dst);
            return NULL;
        }
    }

    /* All strings need a terminating '\0' at end.
       We allocated enough room for it too. */
    *end = '\0';

    /* Ignore trailing whitespace in source string. */           
    while (isspace((unsigned char)(*src)))
        src++;

    /* All of source string processed? */
    if (*src) {
        /* The source string contains more stuff; fail. */
        free(dst);
        return NULL;
    }

    /* Success! */
    return dst;
}

I consider this approach much less useful, because the source string must contain exactly HH:HH:HH:HH:HH:HH (although leading and trailing whitespace is allowed). Parsing it to an unsigned integer lets you e.g. read a line, and parse all such patterns on it, with a simple loop.

If you find any bugs or issues in the above, let me know in a comment so I can verify and fix if necessary.

How to convert a string say of type AA:BB:CC:DD:EE:FF to 0xaabbccddeeff in C?

2 Answers2