4

Is there a C pre-processor string manipulation that could be used to extract substring from given string ?

I want to divide hexadecimal string representing __uint128 number into two hexadecimal 64bit chunks in order to produce 128bit number for given type.

As in pseudocode:

#include <inttypes.h>
#include <ctype.h>

#define UINT128_C(X)   // extraxt hi (0x == 2) + (ffffffffffffffff == 16) == 18
                       // extract lo (ffffffffffffffff == 16) 
                       // prepend lo with string "0x"                     == 18
                       // (((uint128_t)(hi) << 64) | (uint128_t)(lo))

typedef __uint128_t     uint128_t;

uint128_t x;

x = UINT128_C( 0xffffffffffffffffffffffffffffffff );
Dawid Szymański
  • 775
  • 6
  • 15
  • I doubt this is possible. Why is using two integer constants not acceptable? Oh, and Clang actually does support 128-bit integer constants with `-fms-extensions`. – cremno Jul 02 '15 at 17:41
  • @cremno It is acceptable, and I have implemented this solution, but estetic factor plays role in this case as in pseudocode. I can imagine that it can be done, as [link](http://libh.sourceforge.net/) has UINT128_C and somehow uses it for bit manipulation, ... but in source code there is no definition of UINT128_C, and I cant compile source code with success. – Dawid Szymański Jul 02 '15 at 17:49
  • In my [answer](http://stackoverflow.com/questions/31089069/operations-on-hexadecimal-strings-in-context-of-uint128-t-integers/31089630#31089630) to your similar question, I already stated there is no standard support for _int128 and there is no standard type __uint128. Just read the [standard](http://port70.net/~nsz/c/c11/n1570.html). Or check gcc documentation for related extensions. – too honest for this site Jul 02 '15 at 18:01
  • Note: Using the preprocessor to split an integer value into two half-sides values is the wrong approach. Why not just mask and shift for both halves? If the compiler supports a 128 bit integer, it should also provide the basic operators. – too honest for this site Jul 02 '15 at 18:04
  • @Olaf. This question is further then similar. Agglomeration of similar words does not imply semantic similarity. Standarization of __uint128 or uinsigned __int128 is not the case in this question. Question is about string manipulation in pre-processing. – Dawid Szymański Jul 02 '15 at 18:13
  • @Olaf Why would that be wrong approach ? Basiclly there is no difference in two or one string in preprocessing, but it simply looks better - have more estatic vaule. Wouldn't you agree ? – Dawid Szymański Jul 02 '15 at 18:16
  • @Olaf as of gcc 4.9.2 compiler does support __uint128 type, but does not support constants of that value in pre-processing. Hence the question and issue. – Dawid Szymański Jul 02 '15 at 18:19
  • Ok. Now I understand. Too bad: no chance. You have to combine two `uint64_t` constants, but not with cpp.. – too honest for this site Jul 02 '15 at 18:24
  • @Olaf, already done this ... or switch to solution presented by cremno - have to check it. – Dawid Szymański Jul 02 '15 at 18:26
  • @Olaf In C it is feasible (combine two 64bits unsigned integers represented as hexadecimal strings into one 128bit in pre-processing). As both languages share comman origin i assume that it would also be feasible with cpp. – Dawid Szymański Jul 02 '15 at 18:37
  • @DawidSzymański, the C preprocessor was developed *alongside* C, and in that sense they do have a common origin, but that in no way implies that either should be able to do any particular thing that the other can do. On the contrary, if they were capable of the all the same things, then they would not both be needed. – John Bollinger Jul 02 '15 at 18:44
  • @John Bollinger so right. I agree, and tentatively assume. – Dawid Szymański Jul 02 '15 at 18:54
  • @DawidSzymański: I'm talking about the C pre-processor, not c++ (g++ here)! hmm.. after reading Johns answer, I'm confused myself if you really _do_ refer to the C++ compiler. However, I would be very careful here using a C approach for C++. There are quite a lot subtle differences, particularly regarding constants (no further discussion here). – too honest for this site Jul 02 '15 at 18:57
  • 1
    You might use a different pre-processor, e.g. m4; Afaik (which has only a very small basis, though) this might be able to to true text-processing, not just replacement. – too honest for this site Jul 02 '15 at 19:04
  • 1
    Yes, `m4` can do this job. If you happen to be configuring your project with GNU Autoconf, then you're using `m4` already, and you have a convenient framework already in place for performing the preprocessing. – John Bollinger Jul 02 '15 at 19:06

1 Answers1

3

The C preprocessor cannot decompose tokens into smaller tokens, though it can replace them altogether in the special case that they are macro names. Thus, you cannot use it to physically split hexadecimal digit strings that you do not predict in advance.

You can use the preprocessor to convert the hexadecimal digit string into a C string, and perhaps then to wrap that in a conversion function, such as strtoull() (if that happened to be appropriate). If that function in particular were suitable, however, then you could also just use the the hex string as-is, or by pasting a ULL suffix onto it.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Usiung a function will not yield an _integer constant_m however. – too honest for this site Jul 02 '15 at 18:29
  • @Olaf, you are quite right that if the macro relies on a function to perform the conversion then the result cannot be used where a constant is required. That's not the case presented in the question, but it is an important consideration for using such an approach in a more general context. I don't think there is a solution to the problem, as presented, that affords a compile-time constant. – John Bollinger Jul 02 '15 at 18:38
  • @John Bollinger Thank you for your answer. I do predict them in advance as there are my constants. String constants can be concatenated in pre-processing, but connot be chunked? Am i right ? strtoull or event strtoulll (coded) wouldnt be solution as it (in my opinion) unnecessarily complicates code. – Dawid Szymański Jul 02 '15 at 18:51
  • I actually do understand the question that OP actually _does_ want to generate constants: split the _hexadecimal integer constant_ into two halves, cast each to `uint64_t` and then combine these value to a `__uint128` with bitops. @DawidSzymański: Is that correct? – too honest for this site Jul 02 '15 at 18:54
  • @Olaf, I'm just saying that in the OP's code snippet, the macro is used to produce the right-hand side of an assignment statement. The correctness of the resulting preprocessed statement does not rely on the macro yielding a constant. – John Bollinger Jul 02 '15 at 18:57
  • @JohnBollinger: Fair enough. It can be understood like this. – too honest for this site Jul 02 '15 at 19:02
  • @DawidSzymański, correct, the preprocessor cannot break up tokens. As for predicting in advance, the point of prediction would be so that you could write a macro with the predicted name. That's not an option for you because of the form of your strings, and it anyway would probably make your life harder, rather than easier. – John Bollinger Jul 02 '15 at 19:02
  • @Olaf I might be wrong but you might be confusing const as pointer with literal constant in pre-processing. In both cases output result would be the same, but level of abstaction is different. – Dawid Szymański Jul 02 '15 at 19:02
  • @DawidSzymański: No, I do not. And much less about pointers (that has gotten confusing enough already; no need to add pointers to the stew;-). I am talking about [_integer constants_](http://port70.net/~nsz/c/c11/n1570.html#6.4.4.1), allthough I would actually prefer to call them _integer literals_, according to other similar constructs. `const` integers are very different, but would require a [_constant expression_](http://port70.net/~nsz/c/c11/n1570.html#6.6) for initialization (which could be an _integer constant_. - PHEW! – too honest for this site Jul 02 '15 at 19:12
  • @DawidSzymański: Note: there is no term "literal constant" in the standard. It's _string literals_, _integer constants_, _compound literals_. – too honest for this site Jul 02 '15 at 19:14
  • @Olaf Phraseological compound "literal constant" is true semanticlly. – Dawid Szymański Jul 02 '15 at 20:01
  • As for now it seems to be imposible to chunk literals during pre-process phase of compilation under gcc on Linux., But according to @Olaf GNU m4 has these capabilities. – Dawid Szymański Jul 02 '15 at 20:07
  • Code like [@Lundin](http://stackoverflow.com/a/31051026/2410359) did something like a "chunk" a literal. Quite an inventive approach. – chux - Reinstate Monica Jul 02 '15 at 20:15
  • @DawidSzymański: "literal constant" Would that not be a tautology? For m4, please check yourself; It was long time ago I used it and mostly with ready-made stuff. – too honest for this site Jul 02 '15 at 20:16
  • @chux: That will not work for _integer constants_, unless you want to parse them at run-time (which would be John's approach and splitting made no sense at all). Note that this just works as the _string literal_ is already an array of the same type as the target type. – too honest for this site Jul 02 '15 at 20:18