Questions tagged [sharding]

Sharding is a technique of partitioning database tables by row ("horizontally"); typically this technique requires a key to be selected that determines how the rows are to be partitioned.

Sharding is a concept in database design; it refers to the technique of physically partitioning a table or collection by row (also known as horizontal partitioning). To execute the partition, a key or key collection must be defined, which tells the database engine how to determine to which partition each record should belong.

References

1666 questions
331
votes
7 answers

Database sharding vs partitioning

I have been reading about scalable architectures recently. In that context, two words that keep on showing up with regards to databases are sharding and partitioning. I looked up descriptions but still ended up confused. Could the experts at…
Amit Sharma
  • 5,844
  • 5
  • 25
  • 34
199
votes
28 answers

ElasticSearch: Unassigned Shards, how to fix?

I have an ES cluster with 4 nodes: number_of_replicas: 1 search01 - master: false, data: false search02 - master: true, data: true search03 - master: false, data: true search04 - master: false, data: true I had to restart search03, and when it came…
Spanky
  • 5,608
  • 10
  • 39
  • 45
99
votes
8 answers

MySQL sharding approaches?

What is the best approach for Sharding MySQL tables. The approaches I can think of are : Application Level sharding? Sharding at MySQL proxy layer? Central lookup server for sharding? Do you know of any interesting projects or tools in this area?
sheki
  • 8,991
  • 13
  • 50
  • 69
86
votes
3 answers

MongoDB querying performance for over 5 million records

We've recently hit the >2 Million records for one of our main collections and now we started to suffer for major performance issues on that collection. They documents in the collection have about 8 fields which you can filter by using UI and the…
Yarin Miran
  • 3,241
  • 6
  • 30
  • 27
53
votes
4 answers

Database partitioning - Horizontal vs Vertical - Difference between Normalization and Row Splitting?

I am trying to grasp the different concepts of Database Partitioning and this is what I understood of it: Horizontal Partitioning/Sharding: Splitting a table into different table that will contain a subset of the rows that were in the initial table…
50
votes
8 answers

MySQL Partitioning / Sharding / Splitting - which way to go?

We have an InnoDB database that is about 70 GB and we expect it to grow to several hundred GB in the next 2 to 3 years. About 60 % of the data belong to a single table. Currently the database is working quite well as we have a server with 64 GB of…
sme
  • 5,673
  • 7
  • 32
  • 30
42
votes
8 answers

Extreme Sharding: One SQLite Database Per User

I'm working on a web app that is somewhere between an email service and a social network. I feel it has the potential to grow really big in the future, so I'm concerned about scalability. Instead of using one centralized MySQL/InnoDB database and…
Seun Osewa
  • 4,965
  • 3
  • 29
  • 32
36
votes
1 answer

When do you start additional Elasticsearch nodes?

I'm in the middle of attempting to replace a Solr setup with Elasticsearch. This is a new setup, which has not yet seen production, so I have lots of room to fiddle with things and get them working well. I have very, very large amounts of data. I'm…
gdm
  • 905
  • 1
  • 15
  • 21
35
votes
3 answers

Are there any REAL advantages to NoSQL over RDBMS for structured data on one machine?

So I've been trying hard to figure out if NoSQL is really bringing that much value outside of auto-sharding and handling UNSTRUCTURED data. Assuming I can fit my STRUCTURED data on a single machine OR have an effective 'auto-sharding' feature for…
jessedrelick
  • 1,277
  • 1
  • 11
  • 7
32
votes
2 answers

MongoDB to Use Sharding with $lookup Aggregation Operator

$lookup is new in MongoDB 3.2. It performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To use $lookup, the from collection cannot be sharded. On the other…
Map X
  • 444
  • 1
  • 4
  • 14
30
votes
3 answers

Making sharding simple with Django

I have a Django project based on multiple PostgreSQL servers. I want users to be sharded across those database servers using the same sharding logic used by Instagram: User ID => logical shard ID => physical shard ID => database server => schema =>…
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
29
votes
3 answers

multiple consumers per kinesis shard

I read you can have multiple consumer apps per kinesis stream. http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation…
bhomass
  • 3,414
  • 8
  • 45
  • 75
27
votes
8 answers

How do I speed up deletes from a large database table?

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to…
Eric Z Beard
  • 37,669
  • 27
  • 100
  • 145
24
votes
2 answers

Does Cassandra support sharding?

Does Apache Cassandra support sharding? Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the…
Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
22
votes
1 answer

How do you implement sorting and paging on distributed data?

Here's the problem I'm trying to solve: I need to be able to display a paged, sorted table of data that is stored across several database shards. Paging and sorting are well known problems that most of us can solve in any number of ways when the…
Eric Z Beard
  • 37,669
  • 27
  • 100
  • 145
1
2 3
99 100