How can I convert a Cobol COMP field output to readable decimal in C#?

Question

In converting a cobol program to C#, I encountered COMP:

03  Var1                     PIC X(4).
03  Var2                     PIC X(3).
03  Var3                     PIC X(3).
03  Var4                     PIC X(4).
03  Var5                     PIC X(16).
03  Var6                     PIC X(4).
03  Var7                     PIC X(2).
03  Var8                     PIC X.
03  Var9                     PIC X(4).
03  Var10                    PIC X(16).
03  Var11                    PIC S9(7)V9(2) COMP.
03  Var12                    PIC S9(7)V9(2) COMP.
03  Var13                    PIC S9(7)V9(2) COMP.
03  Var14                    PIC S9(7)V9(2) COMP.
03  Var15                    PIC S9(7)V9(2) COMP.
03  Var16                    PIC S9(7)V9(2) COMP.
03  Var17                    PIC S9(7)V9(2) COMP.
03  Var18                    PIC S9(7)V9(2) COMP.
03  Var19                    PIC S9(7)V9(2) COMP.
03  Var20                    PIC S9(7)V9(2) COMP.
03  Var21                    PIC S9(7)V9(2) COMP.
03  Var22                    PIC S9(7)V9(2) COMP.
03  Var23                    PIC S9(7)V9(2) COMP.
03  Var24                    PIC S9(7)V9(2) COMP.

I've spent several hours looking into COMP. Most searches yield something about COMP-3 or mention that COMP is a binary conversion. However, the cobol program's COMP output is the non-COMP fields followed by (between the parentheses):

( F ” " )

while the actual values are all 0.00, except that var13 is 64.70

NOTE: these are the values copied from Notepad++. Also, note that I know very little about cobol.

How can I convert from COMP to decimal? Ideally, I could also convert decimal to COMP as well, as I need to put things back into the same format.

I have tried reading the data in as binary with:

public static void ReadBinaryFile(string directoryString)
    {
        using (BinaryReader reader = new BinaryReader(File.Open(directoryString, FileMode.Open)))
        {
            string myString = Encoding.ASCII.GetString(reader.ReadBytes(113));
            Console.WriteLine(myString);
        }
    }

EDIT: On the right track

Thanks to @piet.t and @jdweng for the help.

While there is still an issue with this test code, this should help anyone in my position with their solution:

public static void ReadBinaryFile(string directoryString)
    {
        using (BinaryReader reader = new BinaryReader(File.Open(directoryString, FileMode.Open)))
        {
            string asciiPortion = Encoding.ASCII.GetString(reader.ReadBytes(57)); // Read the non-comp values

            Console.Write(asciiPortion); // Test the ascii portion 

            Console.WriteLine("var11: " + reader.ReadInt32());
            Console.WriteLine("var12: " + reader.ReadInt32());
            Console.WriteLine("var13: " + reader.ReadInt32());
            Console.WriteLine("var14: " + reader.ReadInt32());
            Console.WriteLine("var15: " + reader.ReadInt32());
            Console.WriteLine("var16: " + reader.ReadInt32());
            Console.WriteLine("var17: " + reader.ReadInt32());
            Console.WriteLine("var18: " + reader.ReadInt32());
            Console.WriteLine("var19: " + reader.ReadInt32());
            Console.WriteLine("var20: " + reader.ReadInt32());
            Console.WriteLine("var21: " + reader.ReadInt32());
            Console.WriteLine("var22: " + reader.ReadInt32());
            Console.WriteLine("var23: " + reader.ReadInt32());
            Console.WriteLine("var24: " + reader.ReadInt32());
        }
    }

EDIT 2: Trying to find the issue

Issue: every value appears to be followed by some garbage value which is printed as the next int32.

Actual values:

var11 = var12 = 0.00
var13 = 58.90
var14 = 0.00
var15 = -0.14
var16 = 0.00
var17 = var18 = var19 = var20 = 0.00
var21 = var22 = var23 = var24 = 0.00

Output (with padding):

Var11:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var12:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var13:  5890  HEX: 00001702  BIN: 00000000000000000001011100000010
Var14:   368  HEX: 00000170  BIN: 00000000000000000000000101110000
Var15:   -14  HEX: FFFFFFF2  BIN: 11111111111111111111111111110010
Var16:    -1  HEX: FFFFFFFF  BIN: 11111111111111111111111111111111
Var17:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var18:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var19:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var20:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var21:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var22:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var23:     0  HEX: 00000000  BIN: 00000000000000000000000000000000
Var24:     0  HEX: 00000000  BIN: 00000000000000000000000000000000

Notepad++ (Copied) Representation:

          p  òÿÿÿÿÿÿÿ

Notepad++ (Visual) Representation:

[NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][STX][ETB][NUL][NUL]p[SOH]
[NUL][NUL]òÿÿÿÿÿÿÿ[NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL]
[NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL]
[NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][NUL][LF]

EDIT 3: Solution!

@piet.t had it all right. Thanks for a useful answer to my first question! The issue was something specific to the cobol program. I was led to believe Var14 was always 0, but:

Var14 = SomeCalculationIHadNoIdeaAbout(Var13, SomeOtherNumber);

I used RecordEdit to tweak the data more easily (Warning: the program is a little strange in places) and noticed an odd trend in the "garbage" values.

The real solution to my problem is the code in the first EDIT, which I made days ago :/.

NOTE: I also had to consume a line feed character, which I did not put in that code. To do that, just add another reader.ReadBytes(1);.

NOTE 2: You may need to look into EBDCDIC and/or Endianness, which may make your solution a bit more difficult than mine.

Use BitConverter. May need use Big or Little Endian since IBM is usually backwards from microsoft. It looks like Cobol using standard 4/8 bytes IEEE format for floating point numbers. See IBM : https://www.ibm.com/support/knowledgecenter/en/SS6SG3_3.4.0/com.ibm.entcobol.doc_3.4/cpari09.htm — jdweng, Jun 12 '17 at 16:47
If I'm understanding correctly, you're saying that I want to read my data 4 (or 8) bytes at a time, convert from Little Endian to Big Endian, and then convert to decimal? — JinC, Jun 12 '17 at 17:57
I assume you have 4 bytes which is you binary. Since it is in a file the ascii has to be parsed to bytes. Then the bytes get converted to float/double using BitConverter. — jdweng, Jun 12 '17 at 19:33
@jdweng note that `COMP`-fields are always fixed point/integer, float/double would be `COMP-1` or `COMP-2`. — piet.t, Jun 13 '17 at 09:22
Comp-3 is real old. It looks like BCD (Binary Coded Decimal).PACKED-DECIMAL and COMP-3 are synonyms. Packed-decimal items occupy 1 byte of storage for every two decimal digits you code in the PICTURE description, except that the rightmost byte contains only one digit and the sign. This format is most efficient when you code an odd number of digits in the PICTURE description, so that the leftmost byte is fully used. Packed-decimal items are handled as fixed-point numbers for arithmetic purposes. See Wiki : https://en.wikipedia.org/wiki/Binary-coded_decimal — jdweng, Jun 13 '17 at 10:04
@jdweng I don't believe COBOL standards pre-2014 specified the internal format of floating point numbers; it is up to the compiler writer. IBM mainframe COBOL compilers still use hexadecimal floating point (HFP, the original System/360 format), even at 6.1. Reserved words `STANDARD-BINARY` (IEEE binary floating point (BFP) and `STANDARD-DECIMAL` (decimal floating point (DFP), IEEE-754/2008) were added at the 2010 revision of the 2002 standard, but IBM has not implemented them for the mainframe compilers. It would be easy for them to do so, as all now use the same underlying code generator. — zarchasmpgmr, Jun 13 '17 at 16:13
I never really used IBM main frames. Last time was in 1974 with PL1. — jdweng, Jun 13 '17 at 16:24

score 7 · Accepted Answer · answered Jun 13 '17 at 06:19

7

Things will get a little complicated here since the COBOL-Program is using fixed-point variables that I think C# doesn't know.

For conversion treat each PIC S9(7)V9(2) COMP-field as an Int32 (it should be BigEndian format). But note that you will not get the actual value but value*100 because of the implicit decimal point in the COBOL field-declaration.

Pay attention that using fixed-point data will allow exact calculations for values with decimals while converting it to floating point in C# may result in rounding since binary floating points can't always exactly represent decimals.

answered Jun 13 '17 at 06:19

piet.t

11,718
21
43
52

Thank you! I'm finally getting some meaningful data back. I am encountering a strange issue though. As I print out each int32, if one value is non-zero, the next value prints as a non-zero number even when it's zero. For example: var13Actual is 58.90 and var14Actual is 0.00, but when I read the values var13 is 5890 (great) and var14 is 368 (not so great). Any idea why this is? – JinC Jun 13 '17 at 14:50
1

I would read the input line and write a hex dump of the input to see what characters C# thinks are there. If you see values like `0x40f5f84ef9f0`, then you're reading in EBCDIC (`0x4e` is period, `0x40` is space, digits are `0xf_`); 0x2035382e3930 is ASCII. Endianness may also be an issue especially if you're not seeing values in the order that I typed them (big endian). – zarchasmpgmr Jun 13 '17 at 16:24
@zarchasmpgmr, I've made a second edit with some more information. This looks like it fits your 0xf_ for digits. I'll try a conversion when I get some more time. – JinC Jun 13 '17 at 18:06
2

@JinC In your example the garbage look like the field before shifted by half a byte. This might just be a coincidence or it might indicate some error in advancing through the array - like going forward 4 bit instead of 4 byte. – piet.t Jun 14 '17 at 06:15
This is really strange to me. I read all of the textual data using ReadBytes() and I read my decimals with ReadInt32(), so I'm not doing any manual advancing. The potential for error seems to be the ReadBytes(), but I've tried varying the input by 1 to see if there was some byte I was forgetting. I'll keep working at it and update if I figure it out, but I'll mark your answer now, because you have been very helpful and put me on the right path. Thank you for all of your help. – JinC Jun 14 '17 at 16:35
If you've read these comments before my edits, go back and read the edits. The issue has been resolved. – JinC Jun 15 '17 at 17:11

How can I convert a Cobol COMP field output to readable decimal in C#?

1 Answers1