1

I have a one mainframe file data like as below

000000720000{

I need to parse the data and load into a hive table like below

72000

the above field is income column and "{" sign which denotes +ve amount datatype used while creating table income decimal(11,2)

in layout.cob copybook using INCOME PIC S9(11)V99

could someone help?

Hogstrom
  • 3,581
  • 2
  • 9
  • 25
  • 1
    Correction `{` is positive 0, `A` is positive 1 etc. `72000A` would be 7200.01 and `72000J` is -7200.01. What do you want help with, what have you tried. JRecord can read Cobol data files with a Cobol copybook. See https://sourceforge.net/projects/jrecord/ – Bruce Martin Mar 23 '22 at 06:11
  • Are you sure you want 720000 and not 72000? Your data definition has two decimal places: the { and one of the zeroes. – piet.t Mar 23 '22 at 06:11
  • @piet.t apologies. I re-corrected my question – Nisha Gupta Mar 24 '22 at 00:19
  • @BruceMartin - yes may be I have to re-frame the sentence. yes { denotes 0. Help I want here is the 72000 should be my expected output once I read from actual data ( input 000000720000{ ) I have tried replacing "{" to 0 and did spark submit it worked fine and data got alligned to respective columns but in long run when the data is huge replacement is bit doubtful. Need help in parsing it properly to hive db – Nisha Gupta Mar 24 '22 at 02:38

1 Answers1

2

The number you want is 7200000 which would be 72000.00.

The conversion you are looking for is:

Positive numbers

{ = 0
A = 1
B = 2
C = 3
D = 4
E = 5
F = 6
G = 7
H = 8
I = 9

Negative numbers (this makes the whole value negative)

} = 0
J = 1
K = 2
L = 3
M = 4
N = 5
O = 6
P = 7
Q = 8
R = 9

Let's explain why.

Based on your question the issue you are having is when packed decimal data is unpacked UNPK into character data. Basically, the PIC S9(11)V2 actually takes up 7 bytes of storage and looks like the picture below.

You'll see three lines. The top is the character representation (missing in the first picture because the hex values do not map to displayable characters) and the lines below are the hexadecimal values. Most significant digit on top and least below.

enter image description here

Note that in the rightmost byte the sign is stored as C which is positive, to represent a negative value you would see a D.

When it is converted to character data it will look like this

enter image description here

Notice the C0 which is a consequence of the unpacking to preserve the sign. Be aware that this display is on z/OS which is EBCDIC. If the file has been transferred and converted to another code-page you will see the correct character but the hex values will be different.

Here are all the combinations you will likely see for positive numbers

enter image description here

and here for negative numbers

enter image description here

To make your life easy, if you see one of the first set of characters then you can replace it with the corresponding number. If you see something from the second set then it is a negative number.

Hogstrom
  • 3,581
  • 2
  • 9
  • 25