6

I am converting a database that was made in the 80's and it seems that the date for some tables is stored as characters. There are 2 columns of data. The first column has data formatted like this:

°& C2 024

The second column is a plain text description:

Description of some stuff

The data gets imported into a 'table' view. C2 defines the table header, 024 defines the row and °& defines what day the table data is pulled in to. Example:

 Date: 2/21/2007
       C1                   C2                       C3                   C4
 000
 003
 006
 009
 012
 015
 018
 021
 024                        Description of some stu...
 027

I have tried converting dates to hex and comparing the hex to character map with no luck.

I have tried using unix date by following instructions from: http://smallbusiness.chron.com/convert-date-hexadecimal-56217.html with no success.

Sample values and known dates:

°&  -  2/21/2007
¬&  -  2/7/2007
+(  -  2/6/2007
ª&  -  2/5/2007
Peter Black
  • 1,142
  • 1
  • 11
  • 29
  • 3
    How did you get to the "Known Dates" for those encoded versions? – JNevill May 06 '15 at 14:52
  • 3
    I don't think the date is stored as characters. It is a binary format and for some reason, your program is interpreting and displaying it as characters. Can you edit your post to show the binary representation? – Steve Wellens May 06 '15 at 15:01
  • °& C2 024 - is literally copied and pasted from the DB. The binary representation of those characters is: 11000010 10110000 The known dates are derived from going to the view and looking at the date. Since there is nothing else in this table, (not even ID's) I concluded that the characters had to be dates. Otherwise there would be no way to make the view match up to the correct date. – Peter Black May 06 '15 at 15:30
  • 1
    I messed with ascii->binary conversion then comparing the values to the "Known Dates". The problem is that third example `+(` doesn't make sense (even outside of binary conversion, it's just oddly misplaced). If you ignore the third value, then the regression from the binary->decimal representation to the "known date" is not linear, so I can't imagine that binary is the answer, and possibly not any other ascii->BaseN conversion. – JNevill May 06 '15 at 15:34
  • Out of curiousity, what does this data represent? Like... what are these descriptions of? – JNevill May 06 '15 at 16:06
  • When you say the data is copied and pasted from the DB, exactly how are you getting it? What column type is the data coming from? I'm a little concerned that you may be seeing some kind of character representation of some binary where we can't know for sure the underlying numbers. (e.g. the ° symbol is encoded differently in different character sets...) – Matt Gibson May 06 '15 at 16:44
  • The column's data should be encoded as Ascii. I assume this because the column contains: ( °& C2 024 ) and if it wasn't, the C2 024 would come back as something else. The column header is 'appropriately' named: "notkey" lol. I am viewing the table in Pervasive. – Peter Black May 06 '15 at 16:55
  • That's why I'm concerned about the character encoding: ° isn't in the standard ASCII character set, so getting it from a number will depend on the character encoding that's happening between the database and your screen. – Matt Gibson May 06 '15 at 17:02
  • 1
    And the ASCII ampersand is 0x26 or `00100110` which isn't even a substring of the alleged equivalent `11000010 10110000` - not even if we drop the leading zeros. None of this data is trustworthy. –  May 06 '15 at 17:07
  • Good point. ASCII does not include that symbol. I tried to look in Pervasive and see what encoding is being used, but it only says 'Default' and all the other options for encoding are greyed out. That leads me to believe that I am using the encoding provided by the DB. But wouldn't lumping all those values in one column and being able to match the view with the data prove that I am viewing the characters correctly? – Peter Black May 06 '15 at 17:12
  • The binary was derived from an ascii to hex conversion using: http://www.binaryhexconverter.com/ascii-text-to-binary-converter. So, that binary is probably irrelevant. – Peter Black May 06 '15 at 17:14
  • After looking, the degree symbol IS on the extended ASCII table. http://www.pcxt-micro.com/ascii.html – Peter Black May 06 '15 at 17:20
  • Copy and pasting would also keep the 'key code'. So the binary conversion should be the same. – Peter Black May 06 '15 at 17:21
  • 3
    You've really found some horrible resources there. The ASCII table appears to be a Latin-1 table. ASCII is a 7-bit code. It stops at 127. The "binary conversion" page sends the input text to in an XMLHttpRequest to a server for conversion (they couldn't figure out how to do that locally in javascript?!) and it *doesn't escape the parameters properly* so it actually loses the ampersand (and everything after it). The binary you got back is actually the UTF-8 for the degree sign (0xc2 0xb0). And no, you shouldn't trust copy-and-paste operations in general not to change character encoding. –  May 06 '15 at 18:14
  • 2
    In your tags, you mention Pervasive and Pervasive.SQL. Are the data files actual Btrieve / Pervasive files? Have you tried using a Pervasive tool like the Function Executor to open the file and see the data in terms of what Btrieve / Pervasive sees? For example, the Btrieve date format is 4 bytes. 2 bytes for the year, 1 byte for the month, and 1 byte for the day. – mirtheil May 06 '15 at 19:30
  • Without more points to look at it'd be very hard to tell what's going on. For example, if you look at the first character, that could be AA, 2B, AC, B0. It's tempting to notice that the second nibble is consecutive in the first three cases, and so are the days. However since the 21st doesn't fit that pattern that's probably only part of the story. It's possible that it's actually an id into a calendar table that only contains days on which a report was run, or something like that. – Alex Weitzer May 11 '15 at 21:22
  • Can you use a hex dump program in your file's first lines and show us the result ? Also, which file type is it ? – Nelson Teixeira May 20 '15 at 18:26
  • Dumping out that column as `HEX()` would really help here. It's obviously binary, but the encoding is unknown in your output. Most databases have some kind of "as hex" method. – tadman May 28 '15 at 19:10

2 Answers2

1

Your question cannot be answered as asked. It's crucial to know the numbers in that date column, because that's assuredly what they are. They are not strings in the encoding you happen to be using 30 years later.

You mention the °& sequence. If you are using Linux with ISO 8859-1 encoding, that degree symbol is B0. If you are using a Windows console with CP850 encoding, it's F8. We also don't know the endianism of the data, which might explain the mystery of +(.

So the first order of business is to learn the actual bytes in the column. Print them in hexadecimal, and convert them to decimal interpreted as big- and little-endian. Then maybe you'll see a pattern that lets you establish the epoch. From there it's all fresh powder and blue sky.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
0

This page says the ASCII codes are used to store the number of days since (or before) January 1, 1900. This is similar to the serial date technique used by Lotus 1-2-3, MS Excel and other software. http://cs.pervasive.com/forums/p/11233/38076.aspx