pandas `read_sql_query` - read `double` datatype in MySQL database to `Decimal`

Question

I'm trying to use read_sql_query() to read a query from MySQL database, one of the field in the database, its type is double(24, 8), I want to use dtype= parameter to have full control of the datatypes and read it to decimal, but seems like pandas can't recognize decimal type so I had to read it to Float64

In the database, the values for this field look like this:

Value
100.96000000
77.17000000
1.00000000
0.12340000

Then I'm trying to read it from Python code:

from decimal import *

dtypes = {
        'id': 'Int64',
        'date': 'datetime64',
        'value': 'Float64'
    }

df = pd.read_sql_query(sql_query, mysql_engine, dtype=dtypes)

but after reading the data from the code above, it looks like this:

Value
100.96
77.17
1.0
0.1234

How can I read this column to decimal and keep all the digits? Thanks.

A number can be _represented_ in many ways. Show us the code that is generating `0.12340000`, the code that is reading that, and the cod that is displaying `0.1234`. — Rick James, Sep 30 '21 at 16:28
Hi I don't have the code that's generating `0.12340000` (that's the bit I'm asking), that's what the data look like in the database. The code is generating `0.1234` is the code on the top of my question. — wawawa, Oct 01 '21 at 08:53
Please show us the code that _reads the data and displays_ `100.96`. — Rick James, Oct 04 '21 at 16:26

score 0 · Answer 1 · answered Oct 01 '21 at 15:25

What "the data looks like in the database" is tricky. This is because the act of printing it out feeds the bits through a formatting algorithm. In this case it removes trailing zeros. To see what is "in the database", one needs to get a hex dump of the file and then decipher it; this is non-trivial.

I believe that DECIMAL numbers hold all the digits specified, packed 2 digits per byte. No, I don't know how they are packed (0..99 versus 2 hex digits; what to do if the number of digits is odd; where is the sign?)

I believe that FLOAT and DOUBLE exactly conform to IEEE-764 encoding format. No, I don't know how the bytes are stored (big-endian vs little-endian). I suspect Python's Float64 is IEEE DOUBLE.

For DECIMAL(10,6), I would expect to see "1.234" to be stored as +, 0001, and 234000, but never displayed with leading zeros and optionally displayed with trailing zeros -- depending on the output formatting package.

For DOUBLE, I would expect to find hex 3ff3be76c8b43958 after adjusting for endianism, and I would not be surprised to see the output be 1.23399999999999999e+0. (Yes, I actually got that, given a suitable formatting in PHP, which I am using.) I would hope to see 1.234 since that is presumably the intent of the number.

Do not use DOUBLE(m,n). The (m,n) leads to extra rounding and it is deprecated syntax. Float and Double are not intended for exact number of decimal places; use DECIMAL for such.

For FLOAT: 1.234 becomes hex 3f9df3b6 and displays something like 1.2339999675751 assuming the output method works in DOUBLE and is asked to show lots of decimal places.

Bottom line: The output method you are using is causing the problem.

Hi thanks, I don't have control on the database side, so I can't change the way the data is storing in the DB, the field is `Decimal` type in the database, my original question is how I can specify the type to keep it as `Decimal` after reading the data using `read_sql_query()` if this makes sense. — wawawa, Oct 04 '21 at 11:31
@Cecilia - `DECIMAL(m,n)` keeps the trailing zeros. The _display_ mechanism is removing them !! — Rick James, Oct 04 '21 at 16:24

pandas `read_sql_query` - read `double` datatype in MySQL database to `Decimal`

1 Answers1