8

I am running the exact same query both through pandas' read_sql and through an external app (DbVisualizer).

DbVisualizer returns 206 rows, while pandas returns 178.

I have tried reading the data from pandas by chucks based on the information provided at How to create a large pandas dataframe from an sql query without running out of memory?, it didn't make a change.

What could be the cause for this and ways to remedy it?

The query:

select *
from rainy_days
where year=’2010’ and day=‘weekend’

The columns contain: date, year, weekday, amount of rain at that day, temperature, geo_location (row per location), wind measurements, amount of rain the day before, etc..

The exact python code (minus connection details) is:

import pandas
from sqlalchemy import create_engine

engine = create_engine(
   'postgresql://user:pass@server.com/weatherhist?port=5439',
)

query = """
        select *
        from rainy_days
        where year=’2010’ and day=‘weekend’
        """
df = pandas.read_sql(query, con=engine)
Community
  • 1
  • 1
oshi2016
  • 875
  • 2
  • 10
  • 20
  • 2
    You are using strange quotes (for the `year=’2010’`), I don't know if that could be a cause, but can you replace them with normal single quotes? (`'`) – joris Mar 08 '16 at 10:58
  • 1
    is there a solution to this? I'm running the same issue. – medev21 Jan 11 '17 at 01:01
  • 1
    same issue. I have a table with total 7 rows, pandas.read_sql_table get 7 but pandas.read_sql get 5 rows. – Paul Yin Mar 26 '21 at 06:20

3 Answers3

-1

It's not a fix, but what worked for me was to rebuild the indices:

  1. drop the indices

  2. export the whole thing to a csv:

  3. delete all the rows:

    DELETE FROM table

  4. import the csv back in

  5. rebuild the indices

pandas:

df = read_csv(..)
df.to_sql(..)

If that works, then at least you know you have a problem somewhere with the indices keeping up to date.

kztd
  • 3,121
  • 1
  • 20
  • 18
  • the strange quotes `` are used in SQL to distinguish field names from reserved words, e.g. SELECT \`right\` FROM ... – kztd Feb 28 '17 at 03:19
-1

https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/14

If you use pure engine.execute you should care about format manually

Paul Yin
  • 1,753
  • 2
  • 13
  • 19
-1

The problem is that pandas returns a packed dataframe (DF). For some reason this is always on by default and the results varies widely as to what is shown. The solution is to use the unpacking operator (*) before/when trying to print the df, like this:

print(*df)

(This is also know as the splat operator for Ruby enthusiasts.)

To read more about this, please check out these references & tutorials:

not2qubit
  • 14,531
  • 8
  • 95
  • 135