1

I am trying to import a Mainframe EDI File back to SQL Server using .NET and I am having problems unpacking some comp-3 fields.

This file was from one of our clients and I have the Copy Book layout for the following fields:

05  EH-GROSS-INVOICE-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-PCT            PIC S9(03)V9999  COMP-3.

I will just be focusing on these 3 fields as all other fields are PIC(X) and are already Unicode values. I loaded everything up with the help of this Tool Ebcdic2Ascii that was created by Max Vagner. I just did a bit of modification on the "Unpack" function and have modified it to

private string Unpack(byte[] packedBytes, int decimalPlaces, out bool isParsedSuccessfully)
{
    isParsedSuccessfully = true;
    return BitConverter.ToString(packedBytes);
}

in order for me to get the following sample data:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
00-1A-1A-03-26-0C        00-00-00-00-00-0C        00-00-00-0C
00-0A-1A-1A-00-0C        00-00-1A-1A-2D-0C        00-1A-00-0C
00-09-10-20-00-0C        00-00-10-1A-1A-0C        00-1A-00-0C

Here is a sample code that I created for Unpacking these values based on my understanding of Comp-3 values:

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var result1 = UnpackMod("00-1A-1A-03-26-0C", 4);
            var result2 = UnpackMod("00-00-00-00-00-0C", 4);
            var result3 = UnpackMod("00-00-00-0C", 4);

            Console.WriteLine($"{result1}\n{result2}\n{result3}\n");

            var result4 = UnpackMod("00-0A-1A-1A-00-0C", 4);
            var result5 = UnpackMod("00-00-1A-1A-2D-0C", 4);
            var result6 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result4}\n{result5}\n{result6}\n");

            var result7 = UnpackMod("00-09-10-20-00-0C", 4);
            var result8 = UnpackMod("00-00-10-1A-1A-0C", 4);
            var result9 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result7}\n{result8}\n{result9}");

            Console.ReadLine();
        }

        /// <summary>
        /// Method for unpacking Comp-3 fields.
        /// </summary>
        /// <param name="hexString"></param>
        /// <param name="decimalPlaces"></param>
        /// <returns>Returns numeric string if parse was successful; else Return input hex string</returns>
        private static string UnpackMod(string inputString, int decimalPlaces)
        {
            var outputString = inputString;

            // Remove "-".
            outputString = outputString.Replace("-", "");

            // Check last character for sign.
            string lastChar = outputString.Substring(outputString.Length - 1, 1);
            bool isNegative = (lastChar == "D" || lastChar == "B");

            // Remove sign character.
            if (lastChar == "C" || lastChar == "A" || lastChar == "E" || lastChar == "F" || lastChar == "D" || lastChar == "B")
            {
                outputString = outputString.Substring(0, outputString.Length - 1);
            }

            // Place decimal point.
            outputString = outputString.Insert(outputString.Length - decimalPlaces, ".");

            // Check if parsed value is numeric. This will also eliminate all leading 0.
            var isParsedSuccessfully = decimal.TryParse(outputString, out decimal decimalValue);

            // If isParsedSuccessfully is true then return numeric string else return inputString..
            string result = "NULL";
            if (isParsedSuccessfully)
            {
                // Convert value to negative.
                if (isNegative)
                {
                    decimalValue = decimalValue * -1;
                }

                result = decimalValue.ToString();
            }

            return result;
        }
    }
}

After running the sample code I was able to get the following results:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
NULL                     0.0000                   0.0000
NULL                     NULL                     NULL
9102.0000                NULL                     NULL        

As you can see I was only able to get following 3 values correctly:

00-09-10-20-00-0C -> 9102.0000
00-00-00-00-00-0C -> 0.0000
00-00-00-0C       -> 0.0000

As referenced from this source: http://www.3480-3590-data-conversion.com/article-packed-fields.html. I have the following understanding about Comp-3:

COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD.

The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded. Since a digit only has ten possible values (0-9).

The low nibble of the least significant byte is used to store the sign for the number. This nibble stores only the sign, not a digit. "C" hex is positive, "D" hex is negative, and "F" hex is unsigned.

Since I know that BCD should only be values 0-9 and that there should just only be a character at the end which could either be "C", "D" or "F". I don't know how to unpack the following values:

00-1A-1A-03-26-0C
00-0A-1A-1A-00-0C        
00-00-1A-1A-2D-0C
00-1A-00-0C
00-00-10-1A-1A-0C
00-1A-00-0C

These values has other characters beside the sign character. I have a feeling that the data has already been converted because if it is not then there should be no readable values there not unless you apply an Encoding. I am still not sure about this and would love any insights on this. Thanks.

Sirch Dcmp
  • 147
  • 1
  • 9
  • 1
    The fields you are trying to convert look to be corrupt. Have they been through Ebcdic to Ascii conversion ???, they look like they might of been. – Bruce Martin Jan 04 '21 at 09:53
  • Hi Bruce, Thanks for the reply on this. I am also assuming that the data is corrupt but was not really sure if that was the case as I was also getting some correct values. This data was provided by our client and there was a 3rd party firm that did the exportation of these files. Is there any explanation why some data are correct and some are not? This may have been probably an exporting issue? Thanks. – Sirch Dcmp Jan 04 '21 at 10:10
  • Also I really could not assume immediately that the data is corrupt because there were correct values and all values has a sign character at the end. If there was a conversion from Ebcdic to Ascii I would presume that there will be some values that will not have a sign character at the end. I am really not sure about this but am I correct with this assumption? Thanks. – Sirch Dcmp Jan 04 '21 at 10:29
  • 1
    Any packed decimal fields must be converted to display format i.e. plain text BEFORE any ebcdic-ascii translation takes place. That conversion only sees bits or bytes not text or numbers so will convert the data accordingly regardless of how a human regards the data. – NicC Jan 04 '21 at 12:21
  • Thanks for the reply NicC. The problem with that though is that the 3rd Party Provider will not modify the format provided. – Sirch Dcmp Jan 04 '21 at 13:30
  • What would be helpful is to do a hex dump of the original data on z/OS and the same for the data you are looking at on the Windows platform. You can the confirm there has been no implicit conversion. For instance if you did an `scp` there is a conversion done for you. FTPS as Binary is better or the existing solution might be doing something. On z/OS you could browse the file in ISPF and then type `hex` on the command line to see the hex values of the data and take a screen shot. – Hogstrom Jan 04 '21 at 15:36
  • Thanks for the reply Hogstrom. That is actually what I would want to happen right now as there is really NO way for me to verify this as I am quite sure that there has been a conversion that had happened. I will just check and coordinate back with the client if this is possible. Thanks for your help! – Sirch Dcmp Jan 04 '21 at 16:05
  • One other comment on the conversion method. There are 4 hex codes for signs. They are documented here: https://www.ibm.com/support/knowledgecenter/SS6SG3_6.3.0/pg/ref/rpari25.html x'C' is positive, x'F' is "unsigned", x'D' and x'B' are negative . I don't think I've ever seen a x'B' in the wild but your code should be updated to accomodate that use case. – Hogstrom Jan 04 '21 at 17:22
  • Along with the previous comment on signs ... this is from the z/Principles of Operation - Decimal Codes chapter: Alternate sign codes are also recognized as valid in the sign position: 1010, 1110, and 1111 are alternate codes for plus, and 1011 is an alternate code for minus. Alternate sign codes are accepted for any decimal source operand, but are not generated in the completed result of a decimal-arithmetic instruction – Hogstrom Jan 04 '21 at 17:40

1 Answers1

1

First, PIC X is not Unicode in COBOL.

Quoting myself from here...

It is common for mainframe data to include both text and binary data in a single record, for example a name, a currency amount, and a quantity:

Hopper Grace ar% .

...which would be...

x'C8969797859940404040C799818385404040404081996C004B'

...in hex. This is code page 37, commonly referred to as EBCDIC.

[...]Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...

x'486F707065722020202047726163652020202020617225002E'

...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte), the currency amount itself has been changed as has the quantity (from decimal 75 to decimal 11,776 due to both code page conversion and mangling of a big endian number as a little endian number).

Likely your data was code page converted on transfer from the mainframe. If you know the original code page and the code page it was converted to, then you might be able to unscramble the packed data.

I say might because, if you're lucky, the hex values you have will have been mapped one-to-one with hex values in the original code page. Note that it is common for both EBCDIC x'15' and x'0D' to be mapped to ASCII x'0D'.

cschneid
  • 10,237
  • 1
  • 28
  • 39
  • Thanks for your insights cschneid. I am also assuming that the current data that was sent to us is already converted to code page 1250 as most of the fields are already readable just except for the packed fields. Normally if it's in Binary EBCDIC Format, this would not be readable if no Encoding is applied. I may not be lucky though as the hex values in the current file does not seem to map one to one. I may need to coordinate back with the Client regarding this and request for a raw Binary File. Thank you so much for your help on this! – Sirch Dcmp Jan 04 '21 at 16:44