pandas read_sql not reading all rows

Question

I am running the exact same query both through pandas' read_sql and through an external app (DbVisualizer).

DbVisualizer returns 206 rows, while pandas returns 178.

I have tried reading the data from pandas by chucks based on the information provided at How to create a large pandas dataframe from an sql query without running out of memory?, it didn't make a change.

What could be the cause for this and ways to remedy it?

The query:

select *
from rainy_days
where year=’2010’ and day=‘weekend’

The columns contain: date, year, weekday, amount of rain at that day, temperature, geo_location (row per location), wind measurements, amount of rain the day before, etc..

The exact python code (minus connection details) is:

import pandas
from sqlalchemy import create_engine

engine = create_engine(
   'postgresql://user:pass@server.com/weatherhist?port=5439',
)

query = """
        select *
        from rainy_days
        where year=’2010’ and day=‘weekend’
        """
df = pandas.read_sql(query, con=engine)

You are using strange quotes (for the `year=’2010’`), I don't know if that could be a cause, but can you replace them with normal single quotes? (`'`) — joris, Mar 08 '16 at 10:58
same issue. I have a table with total 7 rows, pandas.read_sql_table get 7 but pandas.read_sql get 5 rows. — Paul Yin, Mar 26 '21 at 06:20

kztd · Answer 1 · 2022-06-09T12:48:07.033

-1

It's not a fix, but what worked for me was to rebuild the indices:

drop the indices
export the whole thing to a csv:
delete all the rows:

DELETE FROM table
import the csv back in
rebuild the indices

pandas:

df = read_csv(..)
df.to_sql(..)

If that works, then at least you know you have a problem somewhere with the indices keeping up to date.

edited Jun 09 '22 at 12:48

answered Feb 28 '17 at 03:14

kztd

3,121
1
20
18

the strange quotes `` are used in SQL to distinguish field names from reserved words, e.g. SELECT \`right\` FROM ... – kztd Feb 28 '17 at 03:19

score -1 · Answer 2 · answered Mar 26 '21 at 07:04

-1

https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/14

If you use pure engine.execute you should care about format manually

answered Mar 26 '21 at 07:04

Paul Yin

1,753
2
13
19

score -1 · Answer 3 · answered May 06 '22 at 09:27

The problem is that pandas returns a packed dataframe (DF). For some reason this is always on by default and the results varies widely as to what is shown. The solution is to use the unpacking operator (*) before/when trying to print the df, like this:

print(*df)

(This is also know as the splat operator for Ruby enthusiasts.)

To read more about this, please check out these references & tutorials:

pandas read_sql not reading all rows

3 Answers3

Linked