1

I have made a chat application where for now I am storing complete history of chat of all the users.

I am using django as backend and postgres as the database. I am nearing 100k daily active users which makes around 1 million messages per day.

So I am wondering how to scale the postgres data horizontally? I have heard that sharding is not simple in SQL databases and also they have limit to scaling. Like I have heard that Google's big table can scale to 100 of petabytes while postgres is hard to scale to that level. Is it true? If not, how to scale at moment? Also, how to tackle with messages history, they will eventually get too big to handle?

Another question is should I shift to another dataset to handle scaling like mongodb or Cassandra or anything else, because it makes me fear that eventually I will have to scale to billions of messages per month level and if I can shift now that would be better. I don't want to over think or over analyse but just want to get perspective of how to go about it

hardik24
  • 1,008
  • 1
  • 11
  • 34
  • I think that question could be better answered at [dba stack](https://dba.stackexchange.com/) or at [softwareengineering stack](https://softwareengineering.stackexchange.com/), check [here](https://meta.stackoverflow.com/questions/254570/choosing-between-stack-overflow-and-software-engineering). – Felipe Augusto May 03 '19 at 23:23
  • Fun question. How big are your backups? That will give you an idea of how much trouble you're in. Apparently PostgreSQL automatically does some table compression according to this [Answer](https://stackoverflow.com/a/1371950/9705687). Also, I wonder if you could do your own compression by (1) giving reply and message suggestions and then (2) compression those into magic codes. – bfris Sep 21 '19 at 04:16
  • If you want something more horizontally scalable, I would look at YugaByteDB, it's built on top of PostgresSQL. So you won't have to change your code too much to transition over to it. – Mabel Oza Jun 20 '23 at 04:41

1 Answers1

1

... I have heard that Google's Bigtable can scale to 100 of petabytes

It is true that Bigtable is a petabyte-scale database. Note that some folks have certainly scaled commodity PostgreSQL to petabytes, e.g. Yahoo! way back in 2008:

https://www.quora.com/Who-has-the-largest-PostgreSQL-database

As noted in the comments, this seems like more of a stack design question and you may also want to take a look at http://highscalability.com/.

Ramesh Dharan
  • 895
  • 4
  • 14