I'm trying to read some fixed-width data from an IBM mainframe into Pandas. The fields are stored in a mix of EBCDIC, numbers saved as binary (i.e., 255 stored as 0xFF), and binary coded decimal (i.e., 255 stored as 0x02FF.) I know the field lengths and types ahead of time.
Can read_fwf deal with this kind of data? Are there better alternatives?
Example -- I have an arbitrary number of records structured like this I'm trying to read in.
import tempfile
databin = 0xF0F3F1F5F1F3F9F9F2F50AC2BB85F0F461F2F061F2F0F1F8F2F0F1F860F0F360F2F360F1F54BF4F54BF5F44BF5F9F2F9F1F800004908
#column 1 -- ten bytes, EBCDIC. Should be 0315139925.
#column 2 -- four bytes, binary number. Should be 180534149.
#column 3 -- ten characters, EBCDIC. Should be 04/20/2018.
#column 4 -- twenty six characters, EBCDIC. Should be 2018-03-23-15.45.54.592918.
#column 5 -- five characters, packed binary coded decimal. Should be 4908. I know the scale ahead of time.
rawbin = databin.to_bytes((databin.bit_length() + 7) // 8, 'big') or b'\0'
with tempfile.TemporaryFile() as fp:
fp.write(rawbin)