Questions tagged [database-partitioning]

Database partitioning is the act of splitting a database into separate parts, usually for manageability, performance or availability reasons.

Database partitioning is done in one of two ways:

  1. vertically - reducing the number of columns in tables while increasing the number of tables

  2. horizontally (also called sharding) – splitting rows up into multiple tables based on key values. An example would be moving all the rows for each geographic region (such as a country) into their own tables.

Related questions:

Related tags:

1096 questions
331
votes
7 answers

Database sharding vs partitioning

I have been reading about scalable architectures recently. In that context, two words that keep on showing up with regards to databases are sharding and partitioning. I looked up descriptions but still ended up confused. Could the experts at…
Amit Sharma
  • 5,844
  • 5
  • 25
  • 34
53
votes
4 answers

Database partitioning - Horizontal vs Vertical - Difference between Normalization and Row Splitting?

I am trying to grasp the different concepts of Database Partitioning and this is what I understood of it: Horizontal Partitioning/Sharding: Splitting a table into different table that will contain a subset of the rows that were in the initial table…
51
votes
2 answers

How do I execute raw SQL in a django migration

I am aware of the cursor object in Django. Is there any other preferred way to execute raw SQL in migrations? I want to introduce postgresql partitioning for one of my models tables. The partition logic is a bunch of functions and triggers that have…
David Schumann
  • 13,380
  • 9
  • 75
  • 96
46
votes
2 answers

How to migrate an existing Postgres Table to partitioned table as transparently as possible?

I have an existing table in a postgres-DB. For the sake of demonstration, this is how it looks like: create table myTable( forDate date not null, key2 int not null, value int not null, primary key (forDate, key2) ); insert into…
yankee
  • 38,872
  • 15
  • 103
  • 162
30
votes
4 answers

how to drop partition without dropping data in MySQL?

I have a table like: create table registrations( id int not null auto_increment primary key, name varchar(50), mobile_number varchar(13)) engine=innodb partition by range(id) ( partition p0 values less than (10000), partition p0 values less than…
28
votes
5 answers

What is table partitioning?

In which case we should use table partitioning?
P Sharma
  • 2,638
  • 11
  • 31
  • 35
27
votes
5 answers

how to convert unix epoch time to date string in hive

I have a log file which contains timestamp column. The timestamp is in unix epoch time format. I want to create a partition based on a timestamp with partitions year, month and day. So far I have done this but it is throwing an error. PARSE ERROR…
priyank
  • 4,634
  • 11
  • 45
  • 52
27
votes
3 answers

Cassandra: choosing a Partition Key

I'm undecided whether it's better, performance-wise, to use a very commonly shared column value (like Country) as partition key for a compound primary key or a rather unique column value (like Last_Name). Looking at Cassandra 1.2's documentation…
24
votes
4 answers

how to partition a table by datetime column?

I want to partition a mysql table by datetime column. One day a partition.The create table scripts is like this: CREATE TABLE raw_log_2011_4 ( id bigint(20) NOT NULL AUTO_INCREMENT, logid char(16) NOT NULL, tid char(16) NOT NULL, reporterip…
tinychen
  • 1,949
  • 2
  • 11
  • 8
18
votes
4 answers

Hive: dynamic partition adding to external table

I am running hive 071, processing existing data which is has the following directory layout: -TableName - d= (e.g. 2011-08-01) - d=2011-08-02 - d=2011-08-03 ... etc under each date I have the date files. now to load the data I'm using CREATE…
Tomer
  • 859
  • 3
  • 11
  • 19
17
votes
1 answer

What's a good balance to decide when to partition a table in BigQuery?

We are using a public dataset to benchmark BigQuery. We took the same table and partitioned it by day, but it's not clear we are getting many benefits. What's a good balance? SELECT sum(score) FROM…
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
17
votes
2 answers

What is the algorithm used by the ORA_HASH function?

I've come across some code in the application I'm working on that makes a database call merely to call the ORA_HASH function (documentation) on a UUID string. The reason it's doing this is that it needs the value to make a service call to another…
Kaypro II
  • 3,210
  • 8
  • 30
  • 41
17
votes
3 answers

How to set the number of partitions/nodes when importing data into Spark

Problem: I want to import data into Spark EMR from S3 using: data = sqlContext.read.json("s3n://.....") Is there a way I can set the number of nodes that Spark uses to load and process the data? This is an example of how I process the…
pemfir
  • 365
  • 1
  • 3
  • 10
17
votes
2 answers

What does PARTITION BY 1 mean?

For a pair of cursors where the total number of rows in the resultset is required immediately after the first FETCH, ( after some trial-and-error ) I came up with the query below SELECT col_a, col_b, col_c, COUNT(*) OVER( PARTITION BY 1 ) AS…
Everyone
  • 2,366
  • 2
  • 26
  • 39
17
votes
5 answers

MAX() and MAX() OVER PARTITION BY produces error 3504 in Teradata Query

I am trying to produce a results table with the last completed course date for each course code, as well as the last completed course code overall for each employee. Below is my query: SELECT employee_number, MAX(course_completion_date) …
dneaster3
  • 309
  • 1
  • 3
  • 10
1
2 3
73 74