Questions tagged [bigtable]

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Bigtable

A Distributed Storage System for Structured Data

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Some features

fast and extremely large-scale DBMS
a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
designed to scale into the petabyte range
it works across hundreds or thousands of machines
it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
each table has multiple dimensions (one of which is a field for time, allowing versioning)
tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation

BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.

Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.

There are three primary server types of interest in the Bigtable system:

Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

API

Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.

References

Whitepaper Bigtable: A Distributed Storage System for Structured Data
Whitepaper The Google File System
Whitepaper The Chubby lock service for loosely-coupled distributed systems

Related Tags

google-bigquery commercial version of BigTable
hbase open source implementation of BigTable

528 questions

388

votes

8 answers

What database does Google use?

Is it Oracle or MySQL or something they have built themselves?

database google-search google-cloud-datastore bigtable

asked Dec 12 '08 at 14:48

solrevdev

8,863
11
41
49

149

votes

5 answers

What is an SSTable?

In BigTable/GFS and Cassandra terminology, what is the definition of a SSTable?

computer-science nosql cassandra bigtable gfs

asked Apr 04 '10 at 21:46

knorv

49,059
74
210
294

107

votes

3 answers

What's the difference between BigQuery and Bigtable?

Is there any reason why someone would use Bigtable instead of BigQuery? Both seem to support Read and Write operations with the latter offering also advanced 'Query' operations. I need to develop an affiliate network (thus I need to track clicks and…

google-cloud-platform google-bigquery cloud bigtable google-cloud-spanner

asked Oct 07 '16 at 14:35

The user with no hat

10,166
20
57
80

votes

7 answers

Life without JOINs... understanding, and common practices

Lots of "BAW"s (big ass-websites) are using data storage and retrieval techniques that rely on huge tables with indexes, and using queries that won't/can't use JOINs in their queries (BigTable, HQL, etc) to deal with scalability and sharding…

orm nosql hadoop join bigtable

asked Oct 07 '09 at 15:05

snicker

6,080
6
43
49

votes

4 answers

Join operation with NOSQL

I have gone through some articles regarding Bigtable and NOSQL. It is very interesting that they avoid JOIN operations. As a basic example, let's take Employee and Department table and assume the data is spread across multiple tables /…

sql join nosql bigtable

asked Jan 03 '10 at 15:04

Sri

votes

6 answers

storing massive ordered time series data in bigtable derivatives

I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are. I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every…

cassandra finance hbase bigtable time-series

asked Oct 26 '09 at 06:46

Shahbaz

10,395
21
54
83

votes

1 answer

what is a commit log?

in google's bigtable context, what does a commit log mean? and what is the use of a commit log?

database bigtable

asked Apr 06 '10 at 05:46

Yang

9,794
15
44
52

votes

2 answers

Google's Bigtable vs. A Relational Database

Duplicates Why should I use document based database instead of relational database? Pros/Cons of document based database vs relational database I don't know much about Google's Bigtable but am wondering what the difference between Google's…

database relational bigtable

asked Apr 23 '09 at 18:16

Daniel Kivatinos

24,088
23
61
81

votes

1 answer

Price aside, why ever choose Google Cloud Bigtable over Google Cloud Datastore?

If I have a use case for both huge data storage and searchability, why would I ever choose Google Cloud Bigtable over Google Cloud Datastore? I've seen a few questions on SO and other sides "comparing" Bigtable and Datastore, but it seems to boil…

google-cloud-platform nosql google-cloud-datastore bigtable google-cloud-bigtable

asked Nov 26 '18 at 21:21

zeBugMan

votes

5 answers

Is BigTable slow or am I dumb?

I basically have the classic many to many model. A user, an award, and a "many-to-many" table mapping between users and awards. Each user has on the order of 400 awards and each award is given to about 1/2 the users. I want to iterate over all of…

django google-app-engine django-models bigtable

asked Jun 05 '09 at 08:15

Paul Tarjan

48,968
59
172
213

votes

6 answers

Google Bigtable vs BigQuery for storing large number of events

Background We'd like to store our immutable events in a (preferably) managed service. Average size of one event is less than 1 Kb and we have between 1-5 events per second. The main reason for storing these events is to be able to replay them…

google-app-engine google-bigquery bigtable google-cloud-bigtable

asked Dec 23 '15 at 14:14

Johan

37,479
32
149
237

votes

2 answers

bigtable vs cassandra vs simpledb vs dynamo vs couchdb vs hypertable vs riak vs hbase, what do they have in common?

Sorry if this question is somewhat subjective. I am new to 'could store', 'distributed store' or some concepts like this. I really wonder what do they have in common and want to get an overview on all of them. What do I need to prepare if I want to…

couchdb cassandra amazon-simpledb bigtable hypertable

asked Mar 13 '10 at 02:13

Mickey Shine

12,187
25
96
148

votes

2 answers

NoSQL: What does it mean for MongoDB or BigTable to not always be "Available"

Reading Nathan Hurst's Visual Guide to NoSQL Systems, he includes the CAP triangle: Consistency Availibility Partition Tolerance With SQL Server being an AC system, and MongoDB being a CP system. These definitions from come a UC Berkley professor…

mongodb consistency bigtable nosql

asked Sep 07 '11 at 19:18

Ian Boyd

246,734
253
869
1,219

votes

3 answers

How do you design data models for Bigtable/Datastore (GAE)?

Since the Google App Engine Datastore is based on Bigtable and we know that's not a relational database, how do you design a database schema/data model for applications that use this type of database system?

python database google-app-engine bigtable

asked Sep 17 '08 at 04:02

fuentesjr

50,920
27
77
81

votes

10 answers

Need a distributed key-value lookup system

I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be fault-tolerant, and open source. The store should be persistent, but…

java database bigtable key-value-store distributed-database

asked Oct 13 '08 at 15:33

sanity

35,347
40
135
226

2 3

…

35 36 Next