1

Is there any command in stl that converts ascii data to the integer form of its hex representation? such as: "abc" -> 0x616263.

i have the most basic way i can think of:

uint64_t tointeger(std::string){
    std::string str = "abc";
    uint64_t value = 0;                  // allow max of 8 chars
    for(int x = 0; x < str.size(); x++)
        value = (value << 8) + str[x];
    return value;
}

as stated above: tointeger("abc"); returns the value 0x616263

but this is too slow. and because i have to use this function hundreds of thousands of times, it has slowed down my program significantly. there are 4 or 5 functions that rely on this one, and each of those are called thousands of times, in addition to this function being called thousands of times

what is a faster way to do this?

calccrypto
  • 8,583
  • 21
  • 68
  • 99
  • I find it funny that he thinks that *that* has slowed his program "significantly"... – Blindy Jul 28 '11 at 19:16
  • fine.... how should i word it? – calccrypto Jul 28 '11 at 19:20
  • You should say exactly what you want (how is the hex input formed), what you've tried (besides this "basic" way, which certainly isn't the most basic I can think of), *why* it's slow and how slow, etc. Form a proper question. – ssube Jul 28 '11 at 19:21
  • is this better? and @blindy: i know that i need to optimize a lot, so im going through each function one at a time – calccrypto Jul 28 '11 at 19:25
  • Are you running a debug build? Have you tried caching the value of `str.size()`? How slow is "significantly slow"? – Pablo Jul 28 '11 at 19:28

5 Answers5

4

You want to pack ASCII characters from a string into a 64-bit integer.

Since std::string is not an intrinsic type, for safety, copy the data into a buffer:

uint_64 values[100]; // Allocate memory on a 64-bit boundary.

char * p = (char *) values; // Point to the memory as characters.

std::string example("beethoven");

std::copy(example.c_str(), p, example.length();

The copying is more safe as far as alignments go. To be faster, but more dangerous, just avoid the copy:

  uint_64 danger;
  danger = *((uint_64 *) example.c_str());

The std::string::c_str method returns a pointer to a c-style string representation of the text, but the text is not guaranteed to last forever, thus the need to copy. Also, the pointer is only guaranteed to be on a character alignment. Thus if it happens to reside at address 0x1003, the processor may generate an alighnment fault (or slow down because it has to fetch at an un-aligned boundary).

Edit 1:

This method does not take into consideration Endianness. The method uses the Endianness of the platform. Changing Endianness will slow the performance.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Does writing to `p` and then reading from `values` violate strict-aliasing? – Mark B Jul 28 '11 at 19:33
  • @Mark: Please explain *strict-aliasing*. – Thomas Matthews Jul 28 '11 at 19:40
  • http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule and http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html have examples. Again, I'm not sure if it applies here because p is a `char*` and theoretically allowed to point at any type. – Mark B Jul 28 '11 at 19:48
  • darn it. im looking for: if i read it left to right as "abc", the value i expect is 0x616263. i dont want it to become 0x636261 on any machine – calccrypto Jul 28 '11 at 20:04
  • 1
    If you want correct byte order, then your solution is a good idea. – Thomas Matthews Jul 28 '11 at 23:00
2

Have you tried multi character constants? ie

int value = 'abc';
epatel
  • 45,805
  • 17
  • 110
  • 144
1

EDIT: rereading the question it looks like the intention is a BCD-esque conversion for up to an 8-character string, except using 8 bits instead of 4 for each character.

Your approach looks reasonable, or you could use memcpy (string as-is on big-endian, you'd have to reverse the string on little-endian).

However if this is a performance bottleneck for you I think you may wish to reconsider why you need to do this hundreds of thousands of times. Perhaps a fundamental change to the algorithm would yield a far greater performance increase than trying to micro-optimize a conversion. For example, store the values internally as uint64_t and only convert to string form when needed for display/interface. Alternately just store it permanently as a string and eliminate the need to convert it into the pseudo-BCD format.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • that is exactly the answer i did not ask for – calccrypto Jul 28 '11 at 19:04
  • It's difficult to see where you actually asked for any real answer, and certainly never specified what exactly was to be avoided or set any real parameters on the question. – ssube Jul 28 '11 at 19:17
  • i did show an example that did exactly what your code did not do: `a` should become 0x61, not 0xa. if i wanted `10`, i would input the string `\x0a` – calccrypto Jul 28 '11 at 19:21
  • That's just taking the char value, then, not "the integer form of its hex representation". Your question asks for the integer form of a hex string, where x0A becomes 10. If that's not what you want, you may want to rephrase the question. (also, this isn't my code, or even my answer) – ssube Jul 28 '11 at 19:27
0

The fastest way to do something is not to do it at all.

Maybe you can store your data as integers, and only convert it to strings when you have to? Would you still need to convert the data hundreds of thousands of times?

If you really must, I'd probably use a simple fixed-size array (not a string) and unroll the loop. But this is a micro-optimisation, in most cases it's better just to find a different way to do what you're trying to do.

Omri Barel
  • 9,182
  • 3
  • 29
  • 22
0

If you had constraints on how your string was stored you could cast the data directly to an int or long. If you knew your strings were padded at the end with NULL (0) bytes to at least an 8 byte alignment then the following would work.

uint64_t value = *(*unint64_t)str;

There is nothing inherently inefficient about your current code snippet. The operations are not slow. Since the max amount of characters you allow is 8 you can use a switch case and loop unrolling.

uint64_t value = 0;
switch(str.size()) {
    case 0:
        value = 0;
        break;
    case 1: // the 2nd char is a null anyways
    case 2:
        value = *(*uint16_t)str;
        break;
    case 3: // the 4th char would be null
    case 4:
        value = *(*uint32_t)str;
        break;
    case 5:
    case 6:
        value = *(*uint32_t)str + *((*uint16_t)(str+4));
        break;
    case 7:
    case 8:
    default: // 8 or more do the first 8 
        value = *(*uint64_t)str;
        break;
}

Because we use the switch case statement the compiled code will be a jump table instead of a loop (where each iteration would require a comparison operation). Also because we cast the memory to a different type we don't need to loop through each string character/byte separately. MEMORY VALUE 0x8000 0x65,0x66,0x67,0x00 -> "abc",0 The size is 3 but the null terminator makes it 4 bytes long so we can cast the memory value directly to a uint32.

I don't code in c++ so hopefully the casting semantics are correct.

LastCoder
  • 194
  • 2