I have an app which accesses a PostgreSQL database and needs to read some large binary data out of it depending on some needed processing. This might be hundreds of MB or even some GB of data. Please no discussion about using file systems instead or such, it's the way it is now.
That data is simply files of various types, e.g. it might be a Zip container or some other kind of archive. Some of the needed processing is list the contents of the Zip, maybe even extract some members for further processing, maybe hash the stored data... In the end the data is read multiple times, but written only once to store it.
All of the Perl libs I use are able to work with file handles, some with IO::Handle
, others with IO::String
or IO::Scalar
, some others only with low level file handles. So what I've done is create a subclass of IO::Handle
and IO::Seekable
which acts like a wrapper for the corresponding methods around DBD::Pg
. In the CTOR I create a connection to the database, open some provided LOID for reading and store the handle provided by Postgres in the instance. My own handle object is then forwarded to whoever is able to work with such a file handle and can directly read and seek within the blob provided by Postgres.
The problem is libs which use low level file handles or low level file handle operations on IO::Handle
. Digest::MD5
seems to be one, Archive::Zip
another one. Digest::MD5
croak
s and tells me that no handle has been provided, Archive::Zip
on the other hand tries to create a new, own handle from mine, calls IO::Handle::fdopen
and fails in my case.
sub fdopen {
@_ == 3 or croak 'usage: $io->fdopen(FD, MODE)';
my ($io, $fd, $mode) = @_;
local(*GLOB);
if (ref($fd) && "".$fd =~ /GLOB\(/o) {
# It's a glob reference; Alias it as we cannot get name of anon GLOBs
my $n = qualify(*GLOB);
*GLOB = *{*$fd};
$fd = $n;
} elsif ($fd =~ m#^\d+$#) {
# It's an FD number; prefix with "=".
$fd = "=$fd";
}
open($io, _open_mode_string($mode) . '&' . $fd)
? $io : undef;
}
I guess the problem is the low level copy of the handle, which removes my own instance, so there's no instance anymore having my database connection and all that stuff.
So, is it even possible in my case to provide some IO::Handle
which successfully can be used wherever a low level file handle is expected?
I mean, I don't have a real file handle, I have an object only where method calls are wrapped to their corresponding Postgres methods, for which a database handle is needed and such. All of that data needs to be stored somewhere, the wrapping needs to be done etc.
I tried to do what others are doing, like IO::String
, which additionally uses tie
for example. But in the end that use case is different, because Perl is able to create a real low level file handle to some internal memory on its own. Something which is not supported at all in my case. I need to keep my instance around, because only that knows of the handle to the database etc.
Using my handle like an IO::Handle
by calling method read
and such works like expected, but I would like to take it a bit further and be more compatible to whoever doesn't expect to work on IO::Handle
objects. Much like IO::String
or File::Temp
can be used as low level file handles.
package ReadingHandle;
use strict;
use warnings;
use 5.10.1;
use base 'IO::Handle', 'IO::Seekable';
use Carp ();
sub new
{
my $invocant = shift || Carp::croak('No invocant given.');
my $db = shift || Carp::croak('No database connection given.');
my $loid = shift // Carp::croak('No LOID given.');
my $dbHandle = $db->_getHandle();
my $self = $invocant->SUPER::new();
*$self->{'dbHandle'} = $dbHandle;
*$self->{'loid'} = $loid;
my $loidFd = $dbHandle->pg_lo_open($loid, $dbHandle->{pg_INV_READ});
*$self->{'loidFd'} = $loidFd;
if (!defined($loidFd))
{
Carp::croak("The provided LOID couldn't be opened.");
}
return $self;
}
sub DESTROY
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
$self->close();
}
sub _getDbHandle
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
return *$self->{'dbHandle'};
}
sub _getLoid
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
return *$self->{'loid'};
}
sub _getLoidFd
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
return *$self->{'loidFd'};
}
sub binmode
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
return 1;
}
sub close
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
my $dbHandle = $self->_getDbHandle();
my $loidFd = $self->_getLoidFd();
return $dbHandle->pg_lo_close($loidFd);
}
sub opened
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
my $loidFd = $self->_getLoidFd();
return defined($loidFd) ? 1 : 0;
}
sub read
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
my $buffer =\shift // Carp::croak('No buffer given.');
my $length = shift // Carp::croak('No amount of bytes to read given.');
my $offset = shift || 0;
if ($offset > 0)
{
Carp::croak('Using an offset is not supported.');
}
my $dbHandle = $self->_getDbHandle();
my $loidFd = $self->_getLoidFd();
return $dbHandle->pg_lo_read($loidFd, $buffer, $length);
}
sub seek
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
my $offset = shift // Carp::croak('No offset given.');
my $whence = shift // Carp::croak('No whence given.');
if ($offset < 0)
{
Carp::croak('Using a negative offset is not supported.');
}
if ($whence != 0)
{
Carp::croak('Using a whence other than 0 is not supported.');
}
my $dbHandle = $self->_getDbHandle();
my $loidFd = $self->_getLoidFd();
my $retVal = $dbHandle->pg_lo_lseek($loidFd, $offset, $whence);
$retVal = defined($retVal) ? 1 : 0;
return $retVal;
}
sub tell
{
my $self = shift || Carp::croak('The method needs to be called with an instance.');
my $dbHandle = $self->_getDbHandle();
my $loidFd = $self->_getLoidFd();
my $retVal = $dbHandle->pg_lo_lseek($loidFd);
$retVal = defined($retVal) ? $retVal : -1;
return $retVal;
}
1;