2

I try to extract the byte values from a string containing hexadecimal byte representations. The string also contains (unknown) non-hexadecimal characters which needs to be ignored (delimiters, whitespace formatting).

Given an input string "f5 df 45:f8 a 8 f53", the result would be the array [245, 223, 69, 248, 168, 245]. Note that byte values are only output from two hexadecimal digits (hence, the last 3 is ignored).

As an additional constraint, the code needs to work in ecmascript 3 environments.

So far, I have used this approach:

function parseHex(hex){
    hex = hex.replace(/[^0-9a-fA-F]/g, '');
    var i, 
        len = hex.length, 
        bin = [];
    for(i = 0; i < len - 1; i += 2){
        bin.push(+('0x' + hex.substring(i, i + 2)));
    }
    return bin;
}

However, I feel that it would be possible to find a more elegant solution to this, so the question is:

Is there a better solution to this problem (that would perform better or solve the problem with less code)?

Tomas Langkaas
  • 4,551
  • 2
  • 19
  • 34
  • 1
    Probably a better fit for http://codereview.stackexchange.com/ – Liam Sep 08 '16 at 14:41
  • Can not find any question in your post. – ceving Sep 08 '16 at 14:41
  • @ceving, tried to clarify the question. – Tomas Langkaas Sep 08 '16 at 17:00
  • 1
    @ceving, tried to research whether this question should be posted to codereview instead, but I am not yet convinced that it is off-topic for SO. This [guide to code review for SO users](http://meta.codereview.stackexchange.com/questions/5777/a-guide-to-code-review-for-stack-overflow-users) specifically advises: "Please do not vote to close with a custom reason that 'it belongs on Code Review'" – Tomas Langkaas Sep 10 '16 at 19:33

2 Answers2

2

Updated answer (ES3)

Since you mentioned in the comment to my original answer that you're limited to ES3, you should just be able to do this then:

function parseHex(string) {
  // remove all non-hex characters, and then separate them into an array in groups of 2 characters
  var arr = string.replace(/[^0-9a-fA-F]/g, '').match(/[0-9a-fA-F]{2}/g);

  // mutate the array in-place with the correct decimal values
  for(var i = 0; i<arr.length; i++) {
    arr[i] = parseInt(arr[i], 16);
  }

  return arr;
}

parseHex('f5 df 45:f8 a 8 f53'); // => [245, 223, 69, 248, 168, 245]

It'll essentially do what map does, except it has less space complexity than map because it's mutating the array in place. See the updated jsfiddle.

Previous answer (ES5)

You can do this (here's a jsbin example):

'f5 df 45:f8 a 8 f53'.replace(/[^0-9a-fA-F]/g, '').match(/[0-9a-fA-F]{2}/g).map(function(hex) {
  return parseInt(hex, 16);
});

// => [245, 223, 69, 248, 168, 245]

You can make it a function like this:

function parseHex(string) {
  return string.replace(/[^0-9a-fA-F]/g, '').match(/[0-9a-fA-F]{2}/g).map(function(hex) {
    return parseInt(hex, 16);
  });
}

parseHex('f5 df 45:f8 a 8 f53');

Essentially you remove non-hex characters from the string, then match groups of two hex characters (as per your requirements). This answer describes the parseInt(hex, 16) portion (where the reverse would be hex.toString(16)).

Community
  • 1
  • 1
Josh Beam
  • 19,292
  • 3
  • 45
  • 68
0

TL;DR

Using regex methods lead to less code, but worse performance. A non-regex solution gives better performance, at the cost of slightly more code.

Regex approaches

After some more research/googling (and seeing Josh Beams answer use .match()), I figured that there are several possible regex approaches that could improve on the original approach.

Using .match() directly (without .replace()), inspired by Josh Beams answer:

function parseHex(hex){
    hex = hex.match(/[\da-f]/gi);
    for(var i = 0; i < hex.length - 1; i += 2){
        hex[i >> 1] = +('0x' + hex[i] + hex[i + 1]);
    }
    hex.length = i >> 1;
    return hex;
}

Use .replace() for iteration (inspired by this):

function parseHex(hex){
    var bin = [];
    hex.replace(/([\da-f])[^\da-f]*([\da-f])/gi,
        function(m, digit1, digit2){
            bin.push(+('0x' + digit1 + digit2));
        }
    );
    return bin;
}

Looping with .exec() (also inspired by this):

function parseHex(hex){
    var bin = [],
        regex = /([\da-f])[^\da-f]*([\da-f])/gi,
        result;
    while(result = regex.exec(hex)){
        bin.push(+('0x' + result[1] + result[2]));
    }
    return bin;
}

Performance and a non-regex solution

After running performance tests here, none of the regex approaches seem to perform significantly better than the original approach. Out of curiosity, I attempted a non-regex solution, which significantly outperforms the other approaches (at the cost of slightly more code):

function parseHex(hex){
    var bin = [], i, c, isEmpty = 1, buffer;
    for(i = 0; i < hex.length; i++){
        c = hex.charCodeAt(i);
        if(c > 47 && c < 58 || c > 64 && c < 71 || c > 96 && c < 103){
            buffer = buffer << 4 ^ (c > 64 ? c + 9 : c) & 15;
            if(isEmpty ^= 1){
                bin.push(buffer & 0xff);
            }
        }
    }
    return bin;
}

I will probably go for the non-regex approach.

Community
  • 1
  • 1
Tomas Langkaas
  • 4,551
  • 2
  • 19
  • 34