What is a WordArray?

Question

I've been looking at crypto-js and its encoder converts to and from a WordArray. I looked up the documentation and couldn't find any explanation of what a WordArray might be.

To the best of my knowledge, there isn't even a typed array in JavaScript named WordArray, and neither is there a DataView on any of the typed arrays by that name.

I know what a WORD is in the Visual C++ parlance, but I am not sure what it means here.

Strange, all the threads (here, here and here) I found on crypto-js are using the word WordArray without anyone really asking what it is.

Could someone really tell me? Is it a Uint16Array? Or just another fancy word for a regular byte array (Uint8Array or an untyped Array of integral number values)?

The first occurrence of the string `WordArray` (in your link labelled "documentation") states: _"(The hash algorithms accept either strings or instances of CryptoJS.lib.WordArray.) *A WordArray object represents an array of 32-bit words.*"_. The same is repeated in the gist you linked below. Looks straightforward enough. — mbojko, Oct 23 '19 at 13:44
@mbojko Thank you. I missed that. Still, questions remain. So, if I have the string "he", in UTF-8 that becomes two adjacent, 16-bit unsigned integers. If I converted them into a `WordArray`, would that result in an array with just a single element of 32-bits composed of the 16-bits of the alphabet "h" as the `HIWORD` and alphabet "e" as the `LOWORD`? — Water Cooler v2, Oct 23 '19 at 13:49
I just posted a separate question for this. https://stackoverflow.com/q/58525372/303685 — Water Cooler v2, Oct 23 '19 at 14:42
Good question, but I think it was a bit much to split the question in two, so above question has been closed as a dupe. — Maarten Bodewes, Oct 23 '19 at 15:16
Yeah, sure, I saw that. No problem. In fact, thanks very much. :-) — Water Cooler v2, Oct 23 '19 at 15:17
What you were specifying in the other question would be UTF-16, or more precisely UTF-16BE without byte order mark, by the way. — Maarten Bodewes, Oct 23 '19 at 15:22
@MaartenBodewes Oh, the BOM character. I remember bumping into the BOM character when my website wouldn't load on Edge. :-) Where did you read about all this in easy language? — Water Cooler v2, Oct 23 '19 at 15:53

Maarten Bodewes · Accepted Answer · 2019-10-23T15:21:15.957

11

The class is defined in core.js within the CryptoJS library:

/**
 * An array of 32-bit words.
 *
 * @property {Array} words The array of 32-bit words.
 * @property {number} sigBytes The number of significant bytes in this word array.
 */
var WordArray = C_lib.WordArray = Base.extend({

The (byte) values that are put in there are put in the most significant bits of the words (I've checked this against the source code).

For instance, if you would put the value "he" into it as UTF-8 (or Latin1 or ASCII) then you would get a one element array with the value 68_65_00_00 in it, and words set to the value 2. This is because UTF-8 encodes to 8-bit bytes and those bytes are grouped in the topmost 16 bits.

Generally (symmetric) cryptographic algorithms are specified to operate on bits. However, they are generally optimized to work either on 32 or 64 bit words because those are most optimal within 32 or 64 bit machines such as i86 or x64. So any library in any language will internally convert to words before the operations are performed.

Usually libraries define their operations to use bytes rather than words though. CryptoJS is a bit special in the sense that it operates on a buffer of words. That's kind of logical since JavaScript doesn't define byte arrays. It also skips a step, as you would otherwise have to convert from UTF-8 to bytes, and then to words again within the algorithm implementation.

CryptoJS also has a 64 bit word array present, undoubtedly for algorithms such as SHA-512 that are optimized for 64 bit operation.

edited Oct 23 '19 at 15:21

answered Oct 23 '19 at 15:00

Maarten Bodewes

90,524
13
150
263

And yes, the missing direct support for bit-oriented types such as byte and word will make native JavaScript crypto libraries *dog slow* compared to about any other language, let alone C or assembler. Here an interpreter, even with JIT, is next to useless. You'll have to use lower level machine code to speed things up. – Maarten Bodewes Oct 23 '19 at 15:33
Just one more thing. I am noticing some of the values in the `WordArray` I converted from a byte array are signed. I didn't factor for two's complement. I just left-shifted quaruples of bytes into an element of the `WordArray` each. So, the question: the elements of a `WordArray` can be signed, right? I mean, they're not supposed to be an unsigned type, correct? – Water Cooler v2 Oct 24 '19 at 05:06
Ok, I tried it and it works like a charm. Both on the client side and with the same key (as a byte array and not a `WordArray`) on an ASP.NET server. It would make sense then that the elements of the `WordArray` be whatever. That is, it has signed 32-bit integers. – Water Cooler v2 Oct 24 '19 at 06:18
Yes, that's no problem. For example *all* Java's base types (except `char`, which is solely used for, well, characters) are signed. The bits in `int`, the 32-bit word equivalent, are also signed. But in the end, for operations like AES, it doesn't matter. It does make some calculations harder, especially for types other than `int` because you have to take e.g. *sign extension* into account. That's more of a problem for the *programmer* rather than the computer though. So other languages also operate on signed integers without issue. – Maarten Bodewes Oct 24 '19 at 12:13
Note that not just the bit operations remain the same for 2-complement values. Also e.g. subtraction and addition are the same in mod 2^32. You may have to interpret the result without sign during e.g. comparison as integer, but that's about it. And AES doesn't require comparison, all operations are performed without looking at the actual content (if you'd base it on the content you might have a side channel). – Maarten Bodewes Oct 24 '19 at 12:23

What is a WordArray?

1 Answers1

Linked