0

I am using PostgreSQL as my database along with Django

Given below is my model for database

class Users(model.Model):
    email = model.CharField(max_length=50, default="")
    password = model.CharField(max_length=40, default="")
    source = model.CharField(default='unknown', max_length=150)
    domain = model.CharField(max_length=50, default="")
    before_at = model.CharField(max_length=255, default="")
    username = model.CharField(max_length=150, default="")
    hash = model.CharField(max_length=255, default="")
    ipaddress = model.CharField(max_length=50, default="")
    phonenumber = model.CharField(max_length=100, default="")


    class Meta:
            constraints = [
                models.UniqueConstraint(fields=['email', 'password', 'source'], name='uniqueness constraints')
            ]

    def __str__(self):
        return self.email

The thing is most of the rows will have email, password, domain field but the rest will remain empty. Similarly, some will have username,password, source while others are empty. What i want to do is that everyone can search from users table based on different fields such as email, password, domain, username, hash, ipaddress. There are going to be Billions of data in this one table. So what is the best practice is it ok even if rows have a lot of empty fields or does it effect the performance. For instance there are one billion records in total but in which half a billion have empty/null ipaddress field and half billion record have data in ipaddress field then if if i gona search in the table based on ipaddress field does that going effect the search performance or not.

Is there any better approach to do this may be like splitting the table or any other thing?

thanks

Daud Ahmed
  • 190
  • 1
  • 10
  • Read on this [thread](https://dba.stackexchange.com/questions/188667/best-database-and-table-design-for-billions-of-rows-of-data). Should give you a good understanding. – AzyCrw4282 Mar 12 '20 at 14:37
  • Does this answer your question? [PostgreSQL: performance impact of extra columns](https://stackoverflow.com/questions/10020573/postgresql-performance-impact-of-extra-columns) – philipxy Mar 13 '20 at 01:54

1 Answers1

0

The columns later in the table definition take longer to access. For 9 columns, this is unlikely to be meaningful, but for 200 columns it could be.

Skipping over an empty column to get to the one you want takes some tiny amount of time, but skipping over an occupied column does as well.

Presumably the column most likely to be NULL are least likely to be searched, so it would make sense to declare them at the end of table definition.

This is premature optimization. If you are really worried, come up with a data generator which generates realistic data, and ran actual tests of realistic queries against realistic data.

jjanes
  • 37,812
  • 5
  • 27
  • 34