74

I read that

SELECT is a horizontal partition of the relation into two set of tuples.

and

PROJECT is a vertical partition of the relation into two relations.

However, I don't understand what that means. Can you explain it in layman's terms?

Pritam Banerjee
  • 17,953
  • 10
  • 93
  • 108
John Snowden
  • 757
  • 1
  • 7
  • 5
  • In what kind of database? since MySQL doesn't contain a `PROJECT` keyword. – vallentin Aug 18 '13 at 19:19
  • 4
    These don't have proper definitions in layman's terms. The Wikipedia article on Relational Algebra (http://en.wikipedia.org/wiki/Relational_algebra) defines both in language appropriate to their proper understanding. – High Performance Mark Aug 18 '13 at 19:20

5 Answers5

113

Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is:

Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.

See more details here.

Kuldeep Jain
  • 8,409
  • 8
  • 48
  • 73
  • 2
    This answer is more about partitioning on the storage level of the data, though his question seemed to have been more about relation algebra. Not that there's much difference conceptually, just to make clear that it's not a direct answer to the original question (though it's also the way I interpreted it from just reading the title). – Janick Bernet Aug 18 '13 at 19:34
  • Yes, it only answers the question title. updated that in my answer, thanks. – Kuldeep Jain Aug 18 '13 at 19:43
  • I must say @Stefan Steiger's answer is much more detailed explanation of partitioning than my answer: https://stackoverflow.com/a/38350155/948268 – Kuldeep Jain Nov 26 '20 at 02:05
  • I was not able to answer this question in interview, wrote thousands of queries and I really did not know this and they might have judged my knowledge by this ques for sure lol – Aadam Sep 24 '21 at 07:18
24

A projection creates a subset of attributes in a relation hence a "vertical partition"

A selection creates a subset of the tuples in a relation hence a "horizontal partition"

Given a table (r) as

a : b : c : d : e
-----------------
1 : 2 : 3 : 4 : 5
1 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5

An expression such as

PROJECT a, b (SELECT a=1 (r))

-- SELECT a, b FROM r WHERE a=1

Would "do"

a : b | c : d : e
-----------------
1 : 2 | 3 : 4 : 5
1 : 2 | 3 : 4 : 5
=================    <  -- horizontal partition (by SELECTION)
2 : 2 | 3 : 4 : 5
2 : 2 | 3 : 4 : 5

      ^  -- vertical partition (by PROJECTION)

Resulting in

a : b
------
1 : 2 
1 : 2 
T I
  • 9,785
  • 4
  • 29
  • 51
24

Necromancing.
I think the existing answers are too abstract.

So here my attempts at a more practical explanation:

Partitioning form a developer's point of view is all about performance.
More exactly, it's about what happens when you have large amounts of data in your tables, and you still want to query the data fast.

Here some excerpts from slides by Bill Karwin about what exactly horizontal partitioning is all about:

Horrible

The above is bad, because:

Problem 1

Problem 2

In addition:

  • You can't set a foreign key

  • Table structures are not kept in sync

e.g. when running

ALTER TABLE BUGS_2009 ADD COLUMN hours NUMERIC;
  • Prior tables don't contain new column
  • Dissimilar tables can't be combined with UNION

The solution: HORIZONTAL PARTITONING

Horizontal partitioning is a feature which must be supported by the database ENGINE, not the application program. A feature which divides a database table into multiple tables (using some criterion - e.g. usually distinct values in a column, such as the column "year" in example table "bugs"). Each table then contains the same number of columns, but fewer rows.

solution horizontal

The difference: Query Performance and simplicity

Kirk Quote




Now, on the difference between horizontal and vertical partitioning:

"Tribbles" can also accumulate in columns. Example: Column Tribble

The solution to that problem is VERTICAL PARTITIONING

*Proper normalization is ONE form of vertical partitioning*

To quote technet

Vertical partitioning divides a table into multiple tables that contain fewer columns.

The two types of vertical partitioning are normalization and row splitting:

Normalization is the standard database process of removing redundant columns from a table and putting them in secondary tables that are linked to the primary table by primary key and foreign key relationships.

Row splitting divides the original table vertically into tables with fewer columns. Each logical row in a split table matches the same logical row in the other tables as identified by a UNIQUE KEY column that is identical in all of the partitioned tables. For example, joining the row with ID 712 from each split table re-creates the original row. Like horizontal partitioning, vertical partitioning lets queries scan less data. This increases query performance. For example, a table that contains seven columns of which only the first four are generally referenced may benefit from splitting the last three columns into a separate table. Vertical partitioning should be considered carefully, because analyzing data from multiple partitions requires queries that join the tables.

Vertical partitioning also could affect performance if partitions are very large.

Also, large (e.g. large binary object) or seldom used columns can be added to an auxilary table in a 1:1 relationship.
Moving these columns out-of-the-table improves query time considerably.
That sums it up nicely:

Proper normalization

Now on SELECT vs. PROJECT:

This SO post describes the difference as such:

Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a predicate. The predicate is a user defined condition to select rows of user's choice.

Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of the Table (Relation), then one should go for PROJECT Operation.

SELECT is an actual SQL operation (statement), while PROJECT is a term used in relational algebra.

Judging from you posting this on SO and not on MathOverflow, I would suggest you don't read relational algebra books if you just want to learn SQL for developing applications.

If you are in dire need of a recommendation for a good book about (advanced) SQL, here is one

SQL Antipatterns: Avoiding the Pitfalls of Database Programming
Bill Karwin
ISBN-13: 978-1934356555
ISBN-10: 1934356557

That's the one book about SQL worth reading.
Most other books about SQL that I've seen out there can be summed up by this cynical statement about photoshop books:

There are more books about photoshop than people actually using photoshop.

Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
3

The distinction of horizontal vs vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different tables & columns in a separate database or horizontally — storing rows of a same table in multiple database nodes.

enter image description here

Horizontal partitioning is often referred as Database Sharding.

# Example of vertical partitioning
fetch_user_data(user_id) -> db[“USER”].fetch(user_id)
fetch_photo(photo_id) ->    db[“PHOTO”].fetch(photo_id)
# Example of horizontal partitioning
fetch_user_data(user_id) -> user_db[user_id % 2].fetch(user_id)

Find more details here: https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6

Krutik
  • 1,107
  • 13
  • 18
1

Consider a single table in a database, it has some rows and columns.

There are two ways your could pick data: You could pick some rows, or you could pick some columns (well ok, three ways, you could pick some rows, and within that pick some columns.)

You can think of select as picking some rows - that's horizontal (and not picking the rest, hence partitioning)

You can think of project as picking some columns - that's vertical (and not picking the rest)

djna
  • 54,992
  • 14
  • 74
  • 117
  • I'd up-vote the answer but the terms in question *horizontal* and *vertical* are never mentioned and I don't think it's obvious to everyone that picking rows is horizontal and columns is vertical partitioning. – Janick Bernet Aug 18 '13 at 19:49
  • Most rows tend to be horizontal, no? :-) Anyhow, just to be clear added the extra. – djna Aug 18 '13 at 21:06