7

I'm using Django ORM for data query and I get almost 2 million rows in this table.I tried

app_count = App.objects.count()

and

from django.db import connection
cursor = connection.cursor()
cursor.execute('''SELECT count(*) FROM app''')

The mysql slow_query log gave me

Time: 2017-04-27T09:18:38.809498Z

User@Host: www[www] @ [172.19.0.3] Id: 5

Query_time: 4.107433 Lock_time: 0.004405 Rows_sent: 1 Rows_examined: 0

use app_platform; SET timestamp=1493284718; SELECT count(*) FROM app;

This query took avg more than 4 seconds, but when I used mysql client and mysql shell to do this query

mysql> select count(*) from app;

+----------+
| count(*) |
+----------+
|  1870019 |
+----------+

1 row in set (0.41 sec)

Just take me 0.4 second, 10X difference, Why and how to improve it.

EDIT

here is my model

class AppMain(models.Model):
    """
    """
    store = models.ForeignKey("AppStore", related_name="main_store")
    name = models.CharField(max_length=256)
    version = models.CharField(max_length=256, blank=True)
    developer = models.CharField(db_index=True, max_length=256, blank=True)
    md5 = models.CharField(max_length=256, blank=True)
    type = models.CharField(max_length=256, blank=True)
    size = models.IntegerField(blank=True)
    download = models.CharField(max_length=1024, blank=True)
    download_md5 = models.CharField(max_length=256, blank=True)
    download_times = models.BigIntegerField(blank=True)
    snapshot = models.CharField(max_length=2048, blank=True)
    description = models.CharField(max_length=5000, blank=True)
    app_update_time = models.DateTimeField(blank=True)
    create_time = models.DateTimeField(db_index=True, auto_now_add=True)
    update_time = models.DateTimeField(auto_now=True)

    class Meta:
        unique_together = ("store", "name", "version")

EDIT 2

And I'm using Docker and docker-compose for my project

version: '2'
services:
  mysqldb:
    restart: always
    image: mysql:latest
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: just_for_test
      MYSQL_USER: www
      MYSQL_PASSWORD: www
      MYSQL_DATABASE: app_platform
    volumes:
      - mysqldata:/var/lib/mysql
      - ./config/:/etc/mysql/conf.d
      - ./log/mysql/:/var/log/mysql/
  web:
    restart: always
    build: ./app_platform/app_platform
    env_file: .env
    environment:
      PYTHONPATH: '/usr/src/app/app_platform'
    command: bash -c "gunicorn --chdir /usr/src/app/app_platform app_platform.wsgi:application  -k gevent  -w 6 -b :8000 --timeout 8000 --reload"
    volumes:
      - ./app_platform:/usr/src/app
      - ./sqldata:/usr/src/sqldata
      - /usr/src/app/static
    ports:
      - "8000"
    dns:
        - 114.114.114.114
        - 8.8.8.8
    links:
      - mysqldb
  nginx:
    restart: always
    build: ./nginx/
    ports:
      - "80:80"
    volumes:
      - ./app_platform:/usr/src/app
      - ./nginx/sites-enabled/:/etc/nginx/sites-enabled
    links:
      - web:web
volumes:
    mysqldata:

And my django settings look like this:

import os
from django.utils.translation import ugettext_lazy as _

LANGUAGES = (
    ('en', _('English')),
    ('zh-CN', _('Chinese')),
)


LANGUAGE_CODE = 'zh-CN'

BASE_DIR = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

LOCALE_PATHS = (
    os.path.join(BASE_DIR, "locale"),
)

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'just_for_test'

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'app_scrapy',
    'app_user',
    'app_api',
    'app_check',
    'common',
    'debug_toolbar',
]


MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    'django.middleware.locale.LocaleMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

AUTH_USER_MODEL = 'app_user.MyUser'

AUTHENTICATION_BACKENDS = (
    'app_user.models.CustomAuth', 'django.contrib.auth.backends.ModelBackend')


ROOT_URLCONF = 'app_platform.urls'


TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': ["/usr/src/app/app_platform/templates"],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.template.context_processors.i18n',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

WSGI_APPLICATION = 'app_platform.wsgi.application'

LOGIN_REDIRECT_URL = '/'
LOGIN_URL = '/login/'
# Database
# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
# Password validation
# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder'
)

# Internationalization
# https://docs.djangoproject.com/en/1.9/topics/i18n/

TIME_ZONE = 'Asia/Shanghai'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.9/howto/static-files/

STATIC_ROOT = "/static/"

STATIC_URL = '/static/'

STATICFILES_DIRS = (
    'public/static/',
)


DEBUG = True

ALLOWED_HOSTS = []

REST_FRAMEWORK = {
    'DEFAULT_AUTHENTICATION_CLASSES': (
        'rest_framework.authentication.BasicAuthentication',
        'rest_framework.authentication.SessionAuthentication',
    ),
    'DEFAULT_PERMISSION_CLASSES': (
        'rest_framework.permissions.AllowAny',
    ),
    'DEFAULT_PAGINATION_CLASS':
        'rest_framework.pagination.LimitOffsetPagination',
        'PAGE_SIZE': 5,
}

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'app_platform',
        'USER': 'www',
        'PASSWORD': 'www',
        'HOST': 'mysqldb',   # Or an IP Address that your DB is hosted on
        'PORT': '3306',
    }
}

DEBUG_TOOLBAR_CONFIG = {
    "SHOW_TOOLBAR_CALLBACK": lambda request: True,
}

My app table info

CREATE TABLE `app` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL,
  `version` varchar(256) NOT NULL,
  `developer` varchar(256) NOT NULL,
  `md5` varchar(256) NOT NULL,
  `type` varchar(256) NOT NULL,
  `size` int(11) NOT NULL,
  `download` varchar(1024) NOT NULL,
  `download_md5` varchar(256) NOT NULL,
  `download_times` bigint(20) NOT NULL,
  `snapshot` varchar(2048) NOT NULL,
  `description` varchar(5000) NOT NULL,
  `app_update_time` datetime(6) NOT NULL,
  `create_time` datetime(6) NOT NULL,
  `update_time` datetime(6) NOT NULL,
  `store_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `app_store_id_6822fab1_uniq` (`store_id`,`name`,`version`),
  KEY `app_7473547c` (`store_id`),
  KEY `app_developer_b74bcd8e_uniq` (`developer`),
  KEY `app_create_time_a071d977_uniq` (`create_time`),
  CONSTRAINT `app_store_id_aef091c6_fk_app_scrapy_appstore_id` FOREIGN KEY (`store_id`) REFERENCES `app_scrapy_appstore` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1870020 DEFAULT CHARSET=utf8;

EDIT 3

Here is EXPLAIN SELECT COUNT(*) FROM app;

mysql> EXPLAIN SELECT COUNT(*) FROM `app`;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

EDIT 4

Here is my mysql.cnf

innodb_read_io_threads=12
innodb_write_io_threads=12
innodb_io_capacity=300
innodb_read_io_threads=12
innodb_write_io_threads=12  #To stress the double write buffer
innodb_buffer_pool_size=3G
innodb_log_file_size = 32M #Small log files, more page flush
innodb_log_buffer_size=8M
innodb_flush_method=O_DIRECT

My docker settings is 2 CPUS and 4GB Memory

EDIT 5

When I ran ORM query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings?

Windsooon
  • 6,864
  • 4
  • 31
  • 50
  • I'm not sure, but I just cannot reproduce your problem: The slow query times are just the same. So I think the difference may not be caused by Django, can you confirm that you only have one MySQL instance and you're operating the same DB servier via shell and console? – YKL May 02 '17 at 04:13
  • I'm using Docker and docker-compose, and I'm sure there only one MySQL instance blind 3306 port. – Windsooon May 02 '17 at 08:44
  • 1
    Maybe the second query is faster because of the cache ? – Antoine Fontaine May 04 '17 at 09:29
  • is `unique_together` is all what you have in your `Meta`? Did you test it without your debug toolbar? because actally `objects.count()` should habve the same performance as `COUNT(*)`. – Dhia May 05 '17 at 02:43
  • @DhiaTN Yes, unique_together is all I have, I did not use django debug toolbar to test it. Should I? – Windsooon May 05 '17 at 07:07
  • @Antoine Fontaine If this is because the cache, why I still have to spend more than 4 seconds in ORM after I query using mysql shell. – Windsooon May 05 '17 at 07:10
  • @Windsooon no you shouldn't!! Did you try to run `App.objects.count()` from django shell and without docker config? Try to test it locally to be sure that there is no network issue. – Dhia May 05 '17 at 07:22
  • When I ran query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings? – Windsooon May 05 '17 at 08:16
  • You can use command `debugsqlshell` from django-extensions or install a `django-debug-toolbar` to debug issues like this. Django generates absolutely same SQL code for this query, so there's probably something very wrong with you Django setup, it's not a Django ORM problem. The delay is caused by something else. – Eugene Morozov May 08 '17 at 14:31

1 Answers1

6

10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)

But, let's move on to the real question: "4.1s is too slow, what can I do about it."

  • Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?

  • Keep an estimate, rather than recomputing.

  • Let's see EXPLAIN SELECT COUNT(*) FROM app; it might give some more clues. (One place you say app, another you say app_scrapy_appmain; are they the same??)

  • As long as you never DELETE any rows, this would give you the same answer: SELECT MAX(id) FROM app, and run "instantly". (Once a DELETE, ROLLBACK, etc) happens, id(s) will be lost, so the COUNT will be less than MAX.)

More

innodb_buffer_pool_size=3G is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.

Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*). Here's another one:

  • Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily COUNT(*) and whatever else you might like to have. This will even shrink the 0.4s timing by using SUM(subtotal) from this table. More on Summary Tables.
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • At the first point, you mean divide the tables into two tables? I'm not sure I understand the second point but I add the explain ( app_scrapy_appmain is app, sorry for the mistake).So you think the problem is because the cache? I followed `https://stackoverflow.com/questions/4561292/how-to-clear-query-cache-in-mysql` and `https://stackoverflow.com/questions/5231678/clear-mysql-query-cache-without-restarting-server` to clear the cache. Then do the query using mysql shell, still give me 0.5s answer. – Windsooon May 05 '17 at 02:18
  • Not the `Query cache`, that would take 0.001 seconds. The `innodb_buffer_pool`. It is a block-by-block _cache_, so the time will probably vary anywhere between 0.4s and 4.1s depending on other activity on that _and_ other tables. How much RAM do you have? How much data do you have? What is the value of `innodb_buffer_pool_size`? – Rick James May 05 '17 at 04:51
  • I add the mysql.cnf above, and docker settings. I have about 2 million rows in this table, do I have to show some of them? – Windsooon May 05 '17 at 07:04
  • If this is because the cache, why I still have to spend more than 4 seconds in ORM after I query using mysql shell. – Windsooon May 05 '17 at 07:09
  • Run the ORM COUNT twice in a row; is it 4 sec both times? – Rick James May 05 '17 at 18:05
  • It's wired, When I ran ORM query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings? – Windsooon May 08 '17 at 05:24
  • @Windsooon - I think the issue is with caching. 4s means that the cache is 'cold' or has been flushed by other actions. <1s means that the cache contains the relevant blocks, hence is faster. – Rick James May 08 '17 at 14:07
  • @Windsooon - added more to my answer. – Rick James May 08 '17 at 14:17
  • if like you said, the second time I query using FBV should take less than 4s because of the cache. but when I request twice from FBV, I still get two mysql_slow_query, how come? – Windsooon May 09 '17 at 17:33
  • My fault, second time actually faster. The problem is come from the cache, Thanks! – Windsooon May 10 '17 at 01:23
  • Good. I don't like to be stumped. – Rick James May 10 '17 at 01:26