1

Is there anyway to quickly ( O(1), no higher) check if a chunk of data (bits) is corrupted, and the only information is the format it should be in? (i.e. utf8, and you know the range of chars it's allowed to be)

  • (sidenote, impossible to checksum beforehand) – CoolGuyBananarama Sep 02 '15 at 12:46
  • 1
    **No**, to check whether *n* bits are not corrupted you need to look at *n* bits. – Artjom B. Sep 02 '15 at 12:47
  • Dam. I suppose that is to check if each byte is a char... – CoolGuyBananarama Sep 02 '15 at 12:50
  • If you're trying to verify UTF-8, you can't just "check if each byte is a char." UTF-8 is a variable-length encoding that can use up to 4 bytes to encode a single character. http://stackoverflow.com/questions/9533258/what-is-the-maximum-number-of-bytes-for-a-utf-8-encoded-character – Jim Mischel Sep 02 '15 at 14:34
  • ..so how do you check if they're chars? – CoolGuyBananarama Sep 02 '15 at 15:41
  • You verify UTF8 by decoding the stream. UTF8 is self-synchronizing. There are easily recognized bit patterns that you can search for (forward or backward) to find character boundaries. You'd have to look at the UTF-8 spec for details. A search for "UTF-8 self synchronizing" will get you started. – Jim Mischel Sep 03 '15 at 08:07

1 Answers1

3

Its not possible theoretically. Information can be corrupted from large chunk to even only in a single bit(an arbitrary bit can be flipped to 0/1). So, you need to check N bits of your stream to make sure the remote data is not corrupted. It will take atleast O(# of bits in stream).

Kaidul
  • 15,409
  • 15
  • 81
  • 150