0

I have inherited a PHP project that could really, really benefit from a form of Forward Error Correction, as it involves (potentially) the users typing in a base64 encoded string of an infinitely variable length. This string is split into groups of 6 characters at present for human ease of transcription and joined back together before processing, but by virtue of introducing humans into the equation errors can and do still occur.

Short strings, or those copy-and-pasted are generally fine. The outliers of manually-typed lengthy ones are where the real benefit would be seen.

I've settled on Reed Solomon being the most likely candidate for achieving this (but I'm happy to be pointed towards a more appropriate FEC by those with more practical experience).

Does anyone know of an open source RS encoder and decoder I can use within this PHP application? I have found several ENcoders I can probably hack at from QRCode libraries but a decoder seem to be mythical. I do of course have the option of taking several C implementations and re-writing them (I'm a programmer not a math expert, so writing one from scratch is probably beyond me).

The data at present looks like this (representation only, probably not valid base64!):

fD48Sa 483CDf 18ACDx UYh5jS PQXNT

I'd either like to apply RS encoding to each block after it has been split (lengthening each block, I accept that and it would be acceptable from a project point of view), OR apply it before the string is split into blocks of 6. From what I understand of RS however that would be the more complex option as the string is then not a fixed length.

What I was hoping to be able to find after the 'eureka' moment of coming up with the FEC idea was something that would allow me to do this:

// messy encode pseudocode for demonstration purposes
$data    = "Once upon a time in a land far far away";
$encoded = base64_encode($data);
$split   = chunk_split($encoded, 6, ' ');

foreach($split as $chunk) {
    $rsEncoded .= rsEncode($chunk) . " ";
}

and then then a similar rsDecode() when it is input.

Any hints appreciated...

Lee S
  • 343
  • 1
  • 8
  • I don't quite understand. You have a computer source of base64 (with or without RS), a computer destination, and a *human* to transfer the data? Seriously, why not changing the human part? ... And, essentially, are you asking about some lib or how to code RS by yourself? Lib-searching is offtopic here, and coding by yourself is ... well, not too hard, but certainly non-trivial and more than some simple loops. – deviantfan Nov 14 '15 at 08:16
  • About the splitting: Memory/length-wise most efficient would be to have raw binary data, split into 255 byte blocks (exactly 255), apply RS, repartition the data into 6byte blocks, and *then* apply base64. – deviantfan Nov 14 '15 at 08:18
  • And another note: If it's important that no error is overlooked, use a hash like SHA2 too (this part is available in PHP already). RS detects and corrects errors only up to an certain amount of errors, with a additional hash you can detect much more errors (but not correct them. In case of too much errors, the human will have to do everything again) – deviantfan Nov 14 '15 at 08:23
  • Thanks for the comments thus far. The program is an oddity (I can't really go into too much detail yet) but the human is a key part of the overall strategy, on occasion. Most of the time it will be either directly read, or copy and pasted. It's the outlier example usage scenarios that require manual input. The inner data already has an HMAC so actual error detection isn't a problem, I was looking to make a few transcription errors not a reason to have to start over again from a user point of view. After the hours of searching I think I've decided my only option is to learn RS, and 'DIY' ! – Lee S Nov 14 '15 at 15:36

1 Answers1

0

I may be misunderstanding your problem, but it doesn't sound like a good use case for forward error correction.

Forward error correction is useful when the message is known to be correct at the point when the encoding takes place. Errors introduced after this point can be overcome during the decoding process. However, from the workflow you're describing, it sounds as though you want to encode the message after the user has entered it, which is too late to detect any errors in their transcription.

It's possible that you're actually describing receiving a Reed Solomon code block, having the user enter their data, and using the check symbols from the received code block to analyze their data. If the number of symbols that differ between the entered data and the received code block is less than the Singleton bound of your RS code, this will indeed allow you to 'correct' the user's input to the message that you received in the first place. I'm not sure why this would be useful, though.


That said, I believe the easiest way to get RS up and running in PHP would be to call a Python implementation from the PHP code.

Wikiversity has a very well documented Python encoder/decoder here, and a further generalized version of the same implementation here. I found both of those resources to be immensely useful when writing my own C++ implementations recently.

Community
  • 1
  • 1
Bear
  • 345
  • 4
  • 16