8

I've spent hours searching for examples of how to use the bsddb module and the only ones that I've found are these (from here):

data = mydb.get(key)
if data:
    doSomething(data)
#####################
rec = cursor.first()
while rec:
    print rec
    rec = cursor.next()
#####################
rec = mydb.set()
while rec:
    key, val = rec
    doSomething(key, val)
    rec = mydb.next()

Does anyone know where I could find more (practical) examples of how to use this package?

Or would anyone mind sharing code that they've written themselves that used it?

Edit:

The reason I chose the Berkeley DB was because of its scalability. I'm working on a latent semantic analysis of about 2.2 Million web pages. My simple testing of 14 web pages generates around 500,000 records. So doing the math out... there will be about 78.6 Billion records in my table.

If anyone knows of another efficient, scalable database model that I can use python to access, please let me know about it! (lt_kije has brought it to my attention that bsddb is deprecated in Python 2.6 and will be gone in 3.*)

BenMorel
  • 34,448
  • 50
  • 182
  • 322
tgray
  • 8,826
  • 5
  • 36
  • 41

5 Answers5

8

These days, most folks use the anydbm meta-module to interface with db-like databases. But the API is essentially dict-like; see PyMOTW for some examples. Note that bsddb is deprecated in 2.6.1 and will be gone in 3.x. Switching to anydbm will make the upgrade easier; switching to sqlite (which is now in stdlib) will give you a much more flexible store.

lt_kije
  • 425
  • 2
  • 4
  • but how scalable is SQLLite? One of the reasons I chose to use the Berkeley DB was because "Berkeley DB scales up extremely well. It can manage multi-terabyte tables with single records as large as four gigabytes." – tgray Apr 01 '09 at 20:48
  • I think sqlite can handle databases up to 2TB, though I haven't pushed it nearly that far myself. Your quote seems to come from Oracle's db documentation. I don't believe that that has much to do with the implementations supported by Python. What exactly are you trying to do? – lt_kije Apr 01 '09 at 21:14
  • Ah -- your new comment helps. ;) At that scale, I think you're best off using an RDBMS (PostgreSQL, MySQL, etc). SQLite will be a good starting place, since it provides a DBAPI interface that will be compatible with the major RDBMS connectors in Python. – lt_kije Apr 01 '09 at 21:17
  • Thanks for the tip! I'll go check them out. – tgray Apr 02 '09 at 12:10
  • 8
    bsddb is deprecated only because it was too difficult for the python team to maintain, it is still going to be developed as an external module. SQLLite is a SQL Database and as such has more overhead than bsddb – Ed L Jul 31 '09 at 16:34
  • 5
    In my experience working with 33GB of key-blobvalue data, I found Berkely DB was several times faster to build the store and iterate over the store than SQLite. – Graham Jan 12 '12 at 16:02
  • Like Ed L, the answer needs editing about the deprecation of bsddb on Python 3. It only means that will not be part of the Python distribution. – sw. Feb 18 '12 at 19:08
  • bsddb is not deprecated exactly, it's just moved out of core, and into an external lib. That external lib is still under active development: http://www.jcea.es/programacion/pybsddb.htm – Brian Minton Mar 18 '14 at 16:08
  • the links are dead – Mateusz Piotrowski Mar 25 '18 at 00:27
5

I'm assuming this thread is still active, so here we go. This is rough code and there's no error checking, but it may be useful as a starting point.

I wanted to use PHP's built-in DBA functions and then read the database using a Python (2.x) script. Here's the PHP script that creates the database:

<?php 
$id=dba_open('visitor.db', 'c', 'db4');
dba_optimize($id);
dba_close($id);
?>

Now, here's the PHP code to insert an entry: I use JSON to hold the "real" data:

<?php 
/* 
    record a visit in a BSD DB
*/
$id=dba_open('visitor.db', 'w', 'db4');
if (!$id) {
    /* dba_open failed */
    exit;
}
$key  = $_SERVER['REQUEST_TIME_FLOAT']; 
$rip  = $_SERVER['REMOTE_ADDR'];
$now  = date('d-m-Y h:i:s a', time()); 
$data = json_encode( array('remote_ip' => $rip, 'timestamp' => $now) );
$userdata=array($key => $data);
foreach ($userdata as $key=>$value) {
dba_insert($key, $value, $id);
}
dba_optimize($id);
dba_close($id);
?>

Now, here's the code that you and I are actually interested in, and it uses Python's bsddb3 module.

#!/usr/bin/env python
from bsddb3 import db
import json

fruitDB = db.DB()
fruitDB.open('visitor.db',None,db.DB_BTREE,db.DB_DIRTY_READ)
cursor = fruitDB.cursor()
rec = cursor.first()

while rec:
    print rec
    visitordata = rec[1]
    print '\t' + visitordata
    jvdata = json.loads(visitordata)
    print jvdata
    rec = cursor.next()
    print '\n\n'
print '----';

fruitDB.close()
peterg22
  • 63
  • 1
  • 3
5

Look at: Lib3/bsddb/test after downloading the source from http://pypi.python.org/pypi/bsddb3/

The current distribution contains the following tests that are very helpful to start working with bsddb3:

test_all.py
test_associate.py
test_basics.py
test_compare.py
test_compat.py
test_cursor_pget_bug.py
test_dbenv.py
test_dbobj.py
test_db.py
test_dbshelve.py
test_dbtables.py
test_distributed_transactions.py
test_early_close.py
test_fileid.py
test_get_none.py
test_join.py
test_lock.py
test_misc.py
test_pickle.py
test_queue.py
test_recno.py
test_replication.py
test_sequence.py
test_thread.py
sw.
  • 3,240
  • 2
  • 33
  • 43
4

Searching for "import bsddb", I get:

...but personally I'd heavily recommend you use sqlite instead of bsddb, people are using the former a lot more for a reason.

James Antill
  • 2,825
  • 18
  • 16
  • Thanks for telling me how you found them too. I'd forgotten that trick. – tgray Apr 01 '09 at 20:40
  • Unfortunately I don't think sqlite will scale well enough for my application (updated question). If you know that sqlite will work (with some certainty), please let me know! – tgray Apr 01 '09 at 21:11
  • I'm not sure sqlite will scale that well, but I'm also not sure bsddb will scale well either. If you are creating the data and then accessing it a lot, cdb might be your best bet. – James Antill Apr 06 '09 at 18:41
  • I'm using Windows, so I don't think cdb is an option. At least, the docs say it is for UNIX. – tgray Aug 03 '09 at 15:29
  • the links are dead – Mateusz Piotrowski Mar 25 '18 at 00:28
1

The Gramps genealogy program uses bsddb for its database

Sam
  • 1,509
  • 3
  • 19
  • 28