0

I want to ask you what programming language I should use to develop a horizontally scalable database. I don't care too much about performance.

Currently, I only know PHP and Python, but I wonder if Python is good for scalability. Or is this even possible in Python?

The reasons I don't use an existing system is, I need deep insight into the system, and there is no database out there that can store indexes the way I want. (It's a mix of non relational, sparse free multidimensional, and graph design)

EDIT: I already have most of the core code written in Python and investigated ways to improve adding data for that type of database design, what limits the use of other databases even more.

EDIT 2: Forgot to note, the database tables are several hundred gigabytes.

4 Answers4

1

The deveopment of a scalable database is language independent, i cannot say much about PHP, but i can tell you good things about Python, it's easy to read, easy to learn, etc. In my opinion it makes the code much cleaner than other languges.

loki
  • 2,271
  • 5
  • 32
  • 46
  • Probably going to develop it in Python anyway, then looking how it performs. –  Mar 29 '12 at 15:57
0

Betweent PHP & Python, definitely Python. Where I work, the entire system is written in Python and it scales quite well.

p.s.: Do take a look at Mongo Db though.

Mihai Oprea
  • 2,051
  • 3
  • 21
  • 39
0

You're looking for MongoDB.

Mongodb has some excellent python drivers. It is a joy to work with.

brice
  • 24,329
  • 7
  • 79
  • 95
  • I know MongoDB and just thought about that. It's probably possible, but will use more computing than actually required, since I cannot manipulate the database engine that easily (if it is open source). –  Mar 29 '12 at 14:40
  • Also, the way the database I'm planning to create uses a different and a lot of quicker way to add data to the database. –  Mar 29 '12 at 14:43
  • Mongo is super highly optimized. I can pretty much guarantee that you won't beat it for performance. Even on my machine (Mac mini), I've measured write performance in excess of 20k writes/sec. – Tyler Eaves Mar 29 '12 at 14:53
  • I read for 32bit systems MongoDB limits your data size to 2gb. All of my computer nodes are 32bit. Even if it's just the index size, it will be way too little. –  Mar 29 '12 at 15:23
  • Agreed, on 32bit systems the index size is quite small. Perhaps look at [couchdb](http://couchdb.apache.org/) instead? – brice Mar 29 '12 at 15:26
  • If I translate the design to databases like couchdb, I will need about 500 tables (each presenting a dimension). Is that possible? –  Mar 29 '12 at 15:51
  • Just did a little benchmark, MongoDB has 35k writes/s, but my current engine has 50k writes/s, and there's yet a lot to improve. I'll probably stick with it. But it Python good for that? –  Mar 29 '12 at 16:21
  • Python will run some serious data processing. It's implemented by some very smart people. It is more than capable of keeping up with 50k writes/s. Additionally, it can be optimised by moving to [PyPy](http://pypy.org/) to near-C performance. It will beat PHP in almost any speed test you throw at it, and has great C interop when you need to go closer to the metal. Unless you intend on learning C, python is your best bet. (and even then, python will be a much better language to work in until you actually reach speed limitations. – brice Mar 29 '12 at 16:26
  • To give you an idea, [Stackless Python](http://en.wikipedia.org/wiki/Stackless_Python) is used to run [Eve online](http://www.eveonline.com/). I'm pretty sure you'll hit all sorts of interesting limitations before python stops being useful... – brice Mar 29 '12 at 16:28
  • 1
    @Brice: Partially agree, but this actually sounds like a niche where Google Go might be valuable. It's a pretty easy language to learn (much easier than C, certainly), and will get you most of the speed of C. – Tyler Eaves Mar 29 '12 at 16:38
  • I agree, especially since we're now at 1.0 :-) In fact, there are tons of great alternatives for fast, capable languages: Go; D; Erlang; Haskell; Racket; SBCL; even Java can be made blazingly fast. For someone who only knows PHP and Python though, Python can take take you a long way... – brice Mar 29 '12 at 16:46
0

Since this is clearly a request for "opinion", I thought I'd offer my $.02

We looked at MongoDB 12-months ago, and started to really like it...but for one issue. MongoDB limits the largest database to amount of physical RAM installed on the MongoDB server. For our tests, this meant we were limited to 4 GB databases. This didn't fit our needs, so we walked away (too bad really, because Mongo looked great).

We moved back to home turf, and went with PostgreSQL for our project. It is an exceptional system, with lots to like.

But we've kept an eye on the NoSQL crowd ever since, and it looks like Riak is doing some really interesting work.

(fyi -- it's also possible the MongoDB project has resolved the DB size issue -- we haven't kept up with that project).

user590028
  • 11,364
  • 3
  • 40
  • 57
  • 1
    That's not true. Only the indexes must fit in RAM (if you're using them). For performance you want the DB in ram if possible, but it's certainly not a hard limit. – Tyler Eaves Mar 29 '12 at 14:54
  • I stand corrected (sorry, it was 12-months ago). Our project needed to create quite a large number of indexes -- and THEY exceeded 4 GB. Thanks for the reminder – user590028 Mar 29 '12 at 14:56