20

I used crc32 to calculate checksums from strings a long time ago, but I cannot remember how I did it.

echo -n "LongString" | crc32    # no output

I found a solution [1] to calculate them with Python, but is there not a direct way to calculate that from a string?

# signed
python -c 'import binascii; print binascii.crc32("LongString")'
python -c 'import zlib; print zlib.crc32("LongString")'
# unsigned
python -c 'import binascii; print binascii.crc32("LongString") % (1<<32)'
python -c 'import zlib; print zlib.crc32("LongString") % (1<<32)'

[1] How to calculate CRC32 with Python to match online results?

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
oxidworks
  • 1,563
  • 1
  • 14
  • 37

7 Answers7

31

I came up against this problem myself and I didn't want to go to the "hassle" of installing crc32. I came up with this, and although it's a little nasty it should work on most platforms, or most modern linux anyway ...

echo -n "LongString" | gzip -1 -c | tail -c8 | hexdump -n4 -e '"%u"'

Just to provide some technical details, gzip uses crc32 in the last 8 bytes and the -c option causes it to output to standard output and tail strips out the last 8 bytes. (-1 as suggested by @MarkAdler so we don't waste time actually doing the compression).

hexdump was a little trickier and I had to futz about with it for a while before I came up with something satisfactory, but the format here seems to correctly parse the gzip crc32 as a single 32-bit number:

  • -n4 takes only the relevant first 4 bytes of the gzip footer.
  • '"%u"' is your standard fprintf format string that formats the bytes as a single unsigned 32-bit integer. Notice that there are double quotes nested within single quotes here.

If you want a hexadecimal checksum you can change the format string to '"%08x"' (or '"%08X"' for upper case hex) which will format the checksum as 8 character (0 padded) hexadecimal.

Like I say, not the most elegant solution, and perhaps not an approach you'd want to use in a performance-sensitive scenario but an approach that might appeal given the near universality of the commands used.

The weak point here for cross-platform usability is probably the hexdump configuration, since I have seen variations on it from platform to platform and it's a bit fiddly. I'd suggest if you're using this you should try some test values and compare with the results of an online tool.

EDIT As suggested by @PedroGimeno in the comments, you can pipe the output into od instead of hexdump for identical results without the fiddly options. ... | od -t x4 -N 4 -A n for hex ... | od -t d4 -N 4 -A n for decimal.

robert
  • 4,612
  • 2
  • 29
  • 39
  • 4
    A more portable solution for hexadecimal is to use [od](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/od.html) instead of hexdump: `... | od -t x4 -N 4 -A n` – Pedro Gimeno Sep 03 '18 at 21:28
  • can confirm this works a treat! `-t x4` for hexadecimal output and `-t d4` for decimal. – robert Sep 06 '18 at 10:49
  • 2
    Use `gzip -1 -c` to make the compression faster, since you're throwing that away anyway. – Mark Adler Aug 19 '21 at 23:48
30

Or just use the process substitution:

crc32 <(echo -n "LongString")

(EDIT: thx @tor-klingberg)

C Würtz
  • 856
  • 9
  • 20
  • 1
    I was looking for this to be able to use pv also. Which outputs a file as a string while producing a progress bar. `crc32 <(pv /some/file)` worked perfectly. – George Jul 18 '19 at 00:05
  • 5
    If you want your pipes going left to right you can do `echo -n "LongString" | crc32 /dev/stdin`. /dev/stdin is a special file that contains the input of the process. – Tor Klingberg May 28 '20 at 14:07
  • Just a suggestion, but probably makes more sense to do `crc32 <(printf "LongString")` so you don't get a `\n` appended – Peter Frost Oct 31 '22 at 15:34
8

Your question already has most of the answer.

echo -n 123456789 | python -c 'import sys;import zlib;print(zlib.crc32(sys.stdin.read())%(1<<32))'

correctly gives 3421780262

I prefer hex:

echo -n 123456789 | python -c 'import sys;import zlib;print("%08x"%(zlib.crc32(sys.stdin.read())%(1<<32)))'
cbf43926

Be aware that there are several CRC-32 algorithms: http://reveng.sourceforge.net/crc-catalogue/all.htm#crc.cat-bits.32

Android Control
  • 496
  • 1
  • 5
  • 14
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Interesting that none of those listed there employs the "ZIP" poly of EDB88320 – silverdr Oct 08 '19 at 17:35
  • @silverdr All of the ones with `poly=0x04c11db7` and `refin=true` do. CRC-32/ISO-HDLC listed there is the PKZIP CRC. – Mark Adler Oct 09 '19 at 04:09
  • I must be missing something obvious here but how does `poly=0x04c11db7` mean employing `edb88320`? I guess it has something to do with the `refin=true`? Honest question as I was looking for definitions needed to adapt a checksumming routine and found conflicting (to me) information. Eventually ended up using `edb88320` with starting seed `ffffffff` and final `ffffffff` EOR to get results compatible with what the mentioned `crc32` script outputs. – silverdr Oct 09 '19 at 06:57
  • @silverdr `0xedb88320` is the bit reversal of `0x04c11db7`. `refin=true` means that the input bits are reflected. In practice, that is never done, since you would have to do it to every input byte. Instead the polynomial is reflected, once. – Mark Adler Oct 09 '19 at 16:53
  • Python 3: `| python3 -c 'import sys;import zlib;print("{:x}".format(zlib.crc32(sys.stdin.buffer.read())%(1<<32)))' ` – Jari Turkia Aug 19 '21 at 19:26
  • This implementation reads the entire file into the memory, this can be troublesome with big files. – legolegs Jan 09 '22 at 19:43
8

I use cksum and convert to hex using the shell builtin printf:

$ echo -n "LongString"  | cksum | cut -d\  -f1 | xargs echo printf '%0X\\n' | sh
5751BDB2

The cksum command first appeared on 4.4BSD UNIX and should be present in all modern systems.

jimis
  • 794
  • 1
  • 9
  • 14
  • I had to use `cut -d" " -f1` instead of `cut -d\ -f1` (SO trims one of the two spaces here) or it would only give an error. – Bowi May 25 '20 at 09:21
  • Similar, but using argument substitution instead of piping to xargs/echo/sh: `printf '%X\n' "$(echo -n "LongString" | cksum | cut -d' ' -f1)"` – Paul Donohue May 25 '23 at 23:17
7

On Ubuntu, at least, /usr/bin/crc32 is a short Perl script, and you can see quite clearly from its source that all it can do is open files. It has no facility to read from stdin -- it doesn't have special handling for - as a filename, or a -c parameter or anything like that.

So your easiest approach is to live with it, and make a temporary file.

tmpfile=$(mktemp)
echo -n "LongString" > "$tmpfile"
crc32 "$tmpfile"
rm -f "$tmpfile"

If you really don't want to write a file (e.g. it's more data than your filesystem can take -- unlikely if it's really a "long string", but for the sake for argument...) you could use a named pipe. To a simple non-random-access reader this is indistinguishable from a file:

fifo=$(mktemp -u)
mkfifo "$fifo"
echo -n "LongString" > "$fifo" &
crc32 "$fifo"
rm -f "$fifo"

Note the & to background the process which writes to fifo, because it will block until the next command reads it.

To be more fastidious about temporary file creation, see: https://unix.stackexchange.com/questions/181937/how-create-a-temporary-file-in-shell-script


Alternatively, use what's in the script as an example from which to write your own Perl one-liner (the presence of crc32 on your system indicates that Perl and the necessary module are installed), or use the Python one-liner you've already found.

slim
  • 40,215
  • 13
  • 94
  • 127
6

Here is a pure Bash implementation:

#!/usr/bin/env bash

declare -i -a CRC32_LOOKUP_TABLE

__generate_crc_lookup_table() {
  local -i -r LSB_CRC32_POLY=0xEDB88320 # The CRC32 polynomal LSB order
  local -i index byte lsb
  for index in {0..255}; do
    ((byte = 255 - index))
    for _ in {0..7}; do # 8-bit lsb shift
      ((lsb = byte & 0x01, byte = ((byte >> 1) & 0x7FFFFFFF) ^ (lsb == 0 ? LSB_CRC32_POLY : 0)))
    done
    ((CRC32_LOOKUP_TABLE[index] = byte))
  done
}
__generate_crc_lookup_table
typeset -r CRC32_LOOKUP_TABLE

crc32_string() {
  [[ ${#} -eq 1 ]] || return
  local -i i byte crc=0xFFFFFFFF index
  for ((i = 0; i < ${#1}; i++)); do
    byte=$(printf '%d' "'${1:i:1}") # Get byte value of character at i
    ((index = (crc ^ byte) & 0xFF, crc = (CRC32_LOOKUP_TABLE[index] ^ (crc >> 8)) & 0xFFFFFFFF))
  done
  echo $((crc ^ 0xFFFFFFFF))
}

printf 'The CRC32 of: %s\nis: %08x\n' "${1}" "$(crc32_string "${1}")"

# crc32_string "The quick brown fox jumps over the lazy dog"
# yields 414fa339

Testing:

bash ./crc32.sh "The quick brown fox jumps over the lazy dog"
The CRC32 of: The quick brown fox jumps over the lazy dog
is: 414fa339
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
2

You can try to use rhash.

Testing:

## install 'rhash'...
$ sudo apt-get install rhash
## test CRC32...
$ echo -n 123456789 | rhash --simple -
cbf43926  (stdin)
Woosung
  • 31
  • 3
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-ask). – Community Sep 15 '21 at 07:38