2

The Problem

I'm using NetBSD 6.1, Perl v5.18.1, and DB_File v1.818. If I iterate over a DB_File-tied hash using each and delete each item from the hash, not all items are deleted. Here is a script demonstrating the problem:

use strict;
use warnings;
use DB_File;

my $dbfile = "/tmp/foo.db";
! -f $dbfile or unlink($dbfile) or die("unable to delete $dbfile");

my %db;
tie(%db, "DB_File", "/tmp/foo.db", O_RDWR|O_CREAT, 0644);

# add some random records
my @chars = ("0".."9", "a".."f");
for (1..10) {
    my ($key, $val);
    $key .= $chars[rand(@chars)] for 1..10;
    $val .= $chars[rand(@chars)] for 1..32;
    $db{$key} = $val;
}

# this doesn't delete everything from the database!
keys(%db); # reset the iterator
while (my ($key, $val) = each(%db)) {
    delete $db{$key};
}

foreach (keys(%db)) {
    print("\$db{$_} = $db{$_}\n");
}

untie(%db);

When I run it, 4 (or sometimes 5) of the 10 records aren't deleted:

$db{4a8e5792e0} = 7a4d078a3f0f3cba750cb395fcc3343d
$db{d28e8cb226} = 17af1122f0b94113416693b1c4165954
$db{a3ae4e2e24} = 3c15270cf16601722bd8106b1727dbc2
$db{886c469eb4} = f1496f83f7866d09c9e28aae8e1b62e6
$db{2c53ebd993} = facfe8228240878aac825de4d97ca22b

If I run the script on a Linux (Ubuntu 14.04) system, then it always works (all records deleted).

If I switch to a foreach loop over the keys, then it works on both NetBSD and Linux:

# this always works
foreach (keys(%db)) {
    delete $db{$_};
}

What the Documentation Says

I haven't been able to find anything that clearly says that deleting while iterating via each doesn't always work.

Here's what I have been able to find:

  • The documentation for foreach says:

    foreach probably won't do what you expect if VAR is a tied or other special variable.

    I'm not sure what is meant by this, but oddly the foreach case is where it does work.

  • The documentation for each says:

    Any insertion into the hash may change the order, as will any deletion, with the exception that the most recent key returned by each or keys may be deleted without changing the order.

    To me this implies that it's safe to delete the current entry while iterating.

  • The documentation for DB_File doesn't mention deletion while iterating.

Questions

Is this problem:

  • caused by a bug in NetBSD's Berkeley DB implementation?
  • caused by a bug in DB_File?
  • a known limitation of each?
  • a known limitation of tied hashes?
  • a known limitation of DB_File tied hashes?

Why does foreach on the keys work when each doesn't?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Richard Hansen
  • 51,690
  • 20
  • 90
  • 97
  • 1
    "To me this implies that it's safe to `delete` the current entry while iterating." For a hash, yes. But you don't have a hash here. Perl's docs cannot make claims as to what DB_File does or doesn't support. – ikegami May 01 '14 at 21:48
  • @ikegami: OK, so that rules out "a known limitation of `each`". Right? – Richard Hansen May 01 '14 at 22:02

1 Answers1

1

My hunch it that the behavior is a limitation of tied hashes. There seems to be no guarantee that DB_File does not rehash when deleting the most recently fetched key.

You also asked about the difference between

while (my ($key, $val) = each(%db)) {
    delete $db{$key};
}

and

foreach (keys(%db)) {
    delete $db{$_};
}

.

In the first case you are meddling with the tied hash while iterating over the tied hash. So there is an opportunity that the iterator will not reach all keys.

In the second case you are first iterating over the tied hash in order to get the complete list of keys. When looping over the complete list you are guaranteed to delete all entries.

BarneySchmale
  • 658
  • 1
  • 4
  • 10
  • Oh, I was under the mistaken impression that `keys()` emitted a Python-like "generator" for efficient iteration over very large (possibly tied) hashes. That's good to know, thank you! – Richard Hansen May 03 '14 at 20:01
  • Assuming it is possible to write a C program using NetBSD's Berkeley DB implementation that can delete while iterating without invalidating the iterator, then it should be possible to write DB_File to behave similarly. So it sounds to me like this is a combination of things: (1) tying leaves the behavior up to the implementation (the implementation is allowed to behave nicely or horribly) and (2) either the implementation of DB_File isn't using the DB interface optimally or the NetBSD DB implementation makes it impossible to safely delete while iterating (it'd be nice to know which). – Richard Hansen May 03 '14 at 20:11