2

I need to byte-shift a text file. I know absolutely nothing about perl, but I found a perfectly working piece of code in perl called moz-byteshift.pl (documentation). This does exactly what I want to do, but I need to do it in C#.

Here's the source code of the perl file:

#!/usr/bin/perl

# To perform a byteshift of 7
#   To decode: moz-byteshift.pl -s -7 <infile >outfile
#   To encode: moz-byteshift.pl -s  7 <infile >outfile

# To perform a byteshift of 13
#   To decode: moz-byteshift.pl -s -13 <infile >outfile
#   To encode: moz-byteshift.pl -s  13 <infile >outfile

use encoding 'latin1';
use strict;
use Getopt::Std;

use vars qw/$opt_s/;

getopts("s:");
if(!defined $opt_s) {
  die "Missing shift\n";
}

my $buffer;
while(1) {
  binmode(STDIN, ":raw");
  my $n=sysread STDIN, $buffer, 1;
  if($n == 0) {
    last;
  }
  my $byte = unpack("c", $buffer);
  $byte += 512 + $opt_s;
  $buffer = pack("c", $byte);
  binmode(STDOUT, ":raw");
  syswrite STDOUT, $buffer, 1;
}

If someone could at least explain how the perl script works, that would be great. Sample code of the equivalent in C# would be better. =)

Thanks for the help.

Andrew Ensley
  • 11,611
  • 16
  • 61
  • 73
  • 2
    I don't get it. If, as you say in one comment, you don't actually know what the perl script does, how do you know it's what you want to do? – ysth May 15 '09 at 06:41
  • This script is being used by a co-worker to perform a function that I now have to implement. That's how. – Andrew Ensley May 15 '09 at 15:05

3 Answers3

4

There's not much to tell. It reads a file one byte at a time, adjusts the value of each byte by an arbitrary value (specified via the -s flag), and writes out the adjusted bytes. It's the binary equivalent of ROT-13 encryption of a text file.

The rest of the details are specific to how Perl does those things. getopts() is a function (from the Getopt::Std module) that processes command-line switches. binmode() puts the filehandles in raw mode to bypass any of the magic that Perl normally does during I/O. The sysread() and syswrite() functions are used for low-level stream access. The pack() and unpack() functions are used to read and write binary data; Perl doesn't do native types.

This would be trivial to re-implement in C. I'd recommend doing that (and binding to it from C# if need be) rather than porting to C# directly.

Michael Carman
  • 30,628
  • 10
  • 74
  • 122
  • Thanks. That is helpful. I guess the part I don't understand is what type of shifting it does. Does it take a byte array like this: byte[] {1,2,3,4,5} and (shifted by one) produce this: byte[] {5,1,2,3,4}? Or does it shift the bits of each byte, turning: byte[]{00000001,00000010,00000011} into (shifting by one): byte[] {10000000,00000001,10000001}? – Andrew Ensley May 15 '09 at 04:12
  • 1
    Calling this a "shift" is kind of a misnomer. It doesn't move bits or bytes. It applies an offset to the value of each byte. If your original data had byte values of 1, 2, 3 and you specified "-s 5" the result would be 6, 7, 8. – Michael Carman May 15 '09 at 04:24
  • So it adds to the byte value? So with a shift of 1, 00000001 becomes 00000010, 00001000 becomes 00001001, and so on? – Andrew Ensley May 15 '09 at 05:37
  • 1
    @Andrew: That's right. Note also that the values wrap around. i.e. 0xFE + 0x04 = 0x02. This makes the transformation reversible. – Michael Carman May 15 '09 at 14:32
1

What the code does is this: Read each byte from standard input one by one (after switching it into raw mode so no translation occurs). The unpack gets the byte value of the character just read so that a '0' read turns into 0x30. The latin1 encoding is selected so that this conversion is consistent (e.g. see http://www.cs.tut.fi/~jkorpela/latin9.html).

Then the value specified on the command line with the -s option is added to this byte along with 512 to simulate a modulus operation. This way, -s 0, -s 256 etc are equivalent. I am not sure why this is needed because I would have assumed the following pack took care of that but I think they must have had good reason to put it in there.

Then, write the raw byte out to standard input.

Here is what happens when you run it on a file containing the characters 012345 (I put the data in the DATA section):

E:\Test> byteshift.pl -s 1 | xxd
0000000: 3132 3334 3536 0b                        123456.

Each byte value is incremented by one.

E:\Test> byteshift.pl -s 257 | xxd
0000000: 3132 3334 3536 0b                        123456.

Remember 257 % 256 = 1. That is:

$byte += $opt_s;
$byte %= 256;

is equivalent to the single step used in the code.

Much later: OK, I do not know C# but here is what I was able to piece together using online documentation. Someone who knows C# should fix this:

using System;
using System.IO;

class BinaryRW {
    static void Main(string[] args) {
        BinaryWriter binWriter = new BinaryWriter(
                Console.OpenStandardOutput()
                );
        BinaryReader binReader = new BinaryReader(
                Console.OpenStandardInput()
                );

        int delta;

        if ( args.Length < 1 
                || ! int.TryParse( args[0], out delta ) )
        {
            Console.WriteLine(
                    "Provide a non-negative delta on the command line"
                    );
        } 
        else {       
            try  {
                while ( true ) {
                    int bin = binReader.ReadByte();
                    byte bout = (byte) ( ( bin + delta ) % 256 );
                    binWriter.Write( bout );
                }
            }

            catch(EndOfStreamException) { }

            catch(ObjectDisposedException) { }

            catch(IOException e) {
                Console.WriteLine( e );        
            }

            finally {
                binWriter.Close();
                binReader.Close();

            }
        }
    }
}

E:\Test> xxd bin
0000000: 3031 3233 3435 0d0a 0d0a                 012345....

E:\Test> b 0 < bin | xxd
0000000: 3031 3233 3435 0d0a 0d0a                 012345....

E:\Test> b 32 < bin | xxd
0000000: 5051 5253 5455 2d2a 2d2a                 PQRSTU-*-*

E:\Test> b 257 < bin | xxd
0000000: 3132 3334 3536 0e0b 0e0b                 123456....
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 1
    I think the 512 is supposed to be a bias to force the value to wrap instead of saturating. I don't think it's necessary, though (at least not in Perl). – Michael Carman May 15 '09 at 14:47
  • 1
    Thank you! That works perfectly. I'm not going to be using this from the command line, but for others that find this question, there is one bug in your code: You should add `args.Length < 1 || ` to the beginning of your if condition to avoid an "index out of bounds" exception when nothing is entered. – Andrew Ensley May 15 '09 at 15:40
  • Why are you trapping delta < 0? That makes the transformation not (easily) reversible. It can be negative in the original code. – Michael Carman May 17 '09 at 14:35
  • Just mental error, I guess. I was focused on getting the syntax right so the program would compile. – Sinan Ünür May 18 '09 at 10:50
1

Judging by the other answers the equivalent in C# would look something like this:

using(Stream sIn = new FileStream(inPath))
{
  using(Stream sOut = new FileStream(outPath))
  {
    int b = sIn.ReadByte();
    while(b >= 0)
    {
      b = (byte)b+1; // or some other value
      sOut.WriteByte((byte)b);
      b = sIn.ReadByte();
    }
    sOut.Close();
  }
  sIn.Close();
}
samjudson
  • 56,243
  • 7
  • 59
  • 69
  • ReadByte returns the value of the byte, or -1 if the end of the stream is reached, so you comment makes no sense. – samjudson May 16 '09 at 16:55
  • According to http://msdn.microsoft.com/en-us/library/system.io.binaryreader.readbyte.aspx the return value of ReadByte is of type System.Byte. According to http://msdn.microsoft.com/en-us/library/system.byte.aspx System.Byte "Represents an 8-bit unsigned integer." There is no mention of ReadByte returning -1 if the end of stream is reached. In fact, a simple test program based on what you wrote above crashed with System.IO.EndOfStreamException. – Sinan Ünür May 20 '09 at 11:52
  • 1
    Well I'm not calling BinaryReader.ReadByte am I, I'm calling Stream.ReadByte. Check the docs: http://msdn.microsoft.com/en-us/library/system.io.stream.readbyte.aspx – samjudson May 20 '09 at 12:31