Is there anyway to quickly ( O(1), no higher) check if a chunk of data (bits) is corrupted, and the only information is the format it should be in? (i.e. utf8, and you know the range of chars it's allowed to be)
Asked
Active
Viewed 79 times
1
-
(sidenote, impossible to checksum beforehand) – CoolGuyBananarama Sep 02 '15 at 12:46
-
1**No**, to check whether *n* bits are not corrupted you need to look at *n* bits. – Artjom B. Sep 02 '15 at 12:47
-
Dam. I suppose that is to check if each byte is a char... – CoolGuyBananarama Sep 02 '15 at 12:50
-
If you're trying to verify UTF-8, you can't just "check if each byte is a char." UTF-8 is a variable-length encoding that can use up to 4 bytes to encode a single character. http://stackoverflow.com/questions/9533258/what-is-the-maximum-number-of-bytes-for-a-utf-8-encoded-character – Jim Mischel Sep 02 '15 at 14:34
-
..so how do you check if they're chars? – CoolGuyBananarama Sep 02 '15 at 15:41
-
You verify UTF8 by decoding the stream. UTF8 is self-synchronizing. There are easily recognized bit patterns that you can search for (forward or backward) to find character boundaries. You'd have to look at the UTF-8 spec for details. A search for "UTF-8 self synchronizing" will get you started. – Jim Mischel Sep 03 '15 at 08:07
1 Answers
3
Its not possible theoretically. Information can be corrupted from large chunk to even only in a single bit(an arbitrary bit can be flipped to 0/1). So, you need to check N
bits of your stream to make sure the remote data is not corrupted. It will take atleast O(# of bits in stream)
.

Kaidul
- 15,409
- 15
- 81
- 150
-
thanks for being patient with what was probably a silly question! – CoolGuyBananarama Sep 02 '15 at 14:30