6

I'm trying to implement a multi-dimensional table with headers.

Here's an example for 2D:

                       < dimension1 >
    /\               'column0'  'column1'
dimension0   'row0'   data00     data10
    \/       'row1'   data01     data11

The headers for rows and columns are text, and the data is anything. I want to be able to do something like this (syntax can be different, I'm beginner in Perl):

my $table = new table(2); # 2 is the number of dimensions

# the following line creates a new row/column if it didn't exist previously
$table['row0']['column0'] = data00;
$table['row0']['column1'] = data01;
$table['row1']['column0'] = data10;
$table['row1']['column1'] = data11;

# the following line returns the headers of the specified dimension
$table->headers(0);
 => ('row0', 'row1')

First question: Is there something like this already done in CPAN? (before you ask I did search for a significant amount of time and I didn't find anything like it)


Second question: Here's my try, I know it's ugly and probably wrong. Any Perl expert out there care to review my code?

package table;

sub new {
  my $class = shift;
  my $dimensions = shift;
  my $self = bless({}, $class);
  $self->{dimensions} = $dimensions;
  $self->{data} = [];
  $self->{headers} = [];
  return $self;
}

sub get_dimensions {
  my $self = shift;
  return $self->{dimensions};
}

# This function creates a header or return its index if it already existed.
# Headers are encoded as an array of hashes so that this is O(1) amortized.

sub header {
  my $self = shift;
  my $dimension = shift;
  my $header = shift;
  my $headers = $self->{headers}[$dimension];
  if(!defined($headers)) {
    $headers = $self->{headers}[$dimension] = {};
  }
  if(!defined($headers->{$header})) {
    $headers->{$header} = scalar keys %$headers;
  }
  return $headers->{$header};
}

# This function returns the list of headers. Because the headers are
# stored as a hash (`header=>index`), I need to retrieve the keys
# and sort them by value.

sub get_headers {
  my $self = shift;
  my $dimension = shift;
  my $headers = $self->{headers}[$dimension];
  return [sort { $headers->{$a} cmp $headers->{$b} } keys %$headers];
}

# This last function stores/retrieves data from the table.

sub data {
  my $self = shift;
  my $data = $self->{data};
  my $dimensions = $self->{dimensions};
  for(my $i = 0; $i < $dimensions-1; ++$i) {
    my $index = $self->header($i, shift);
    if(!defined($data->[$index])) {
      $data->[$index] = [];
    }
    $data = $data->[$index];
  }
  my $index = $self->header($dimensions-1, shift);
  my $value = shift;
  if(defined($value)) {
    $data->[$index] = $value;
  }
  return $data->[$index];
}
Giovanni Funchal
  • 8,934
  • 13
  • 61
  • 110
  • If your data is made of numbers, you may find [Math::MatrixReal](http://search.cpan.org/~leto/Math-MatrixReal-2.08/) really useful. – Marco De Lellis Sep 27 '11 at 09:30
  • @MarcoDeLellis this is only 2D, not any dimension, and there's no headers, so unfortunately that's really not what I'm looking for. – Giovanni Funchal Sep 27 '11 at 09:36
  • 1
    I like that idea and will play around with it tonight. – Egga Hartung Sep 27 '11 at 11:43
  • @JanHartung great, let me know if you find a better way of doing it. I've implemented the example within an application, and after some profiling I found that way too much time is spent in the `header` function. – Giovanni Funchal Sep 27 '11 at 12:13
  • 1
    @GiovanniFunchal: Ok, your code seems to do pretty fine. Surely there could be done some tweaking, but for that I would like to have more information. You profiled the code, so I guess performance is an issue. When I profiled with my testcase (roughly the scenario of your example usage), there was about 60% time spent in the `data` function. The best thing was if you could provide some sample data and a description of the actual operations you want to do with it. Also interesting would be your generosity with memory, so perhaps we could do some speed-up by using hashes instead of arrays. – Egga Hartung Sep 28 '11 at 08:37
  • @JanHartung Nice, I'm using this to generate html pages in a server so I actually don't have any memory limits but cpu time is important so that the pages won't timeout. I can't share my testcase unfortunately but I can say that my profiling results were close to yours. – Giovanni Funchal Sep 28 '11 at 08:40
  • @JanHartung I think the performance overhead comes from Perl not inlining the accessors (data and header functions). Anyway thanks a lot for your help. – Giovanni Funchal Sep 28 '11 at 08:41

3 Answers3

2

You want a structure for an "N" dimensional table. I doubt there's a CPAN module that can do this because it's just not that common a situation.

The problem is that the data structure grows quite rapidly and so does the complexity.

You can store an N dimensional table in a single list by using a bit of mathematics to transform the N dimensional array into a single dimension. Let's say that X represents the X dimension and X' represents the length of that dimension. For a two dimensional table, you could get the value by doing:

X * Y` + Y.

For a 3 dimensional table X, Y, Z, the answer would be:

X * (Y' * Z') + Y * Z' + Z

For a 4 dimensional table W, X, Y, Z, the answer would be:

W * (X' * Y' * Z') + X * (Y' + Z') + Y * Z' + Z'

(I hope the math is right).

Therefore, I can imagine a structure like this for an N dimensional table. It would involve two different classes: One represents the dimensional information and the other represents the actual data (including all of the dimensions).

  • Dimension (Class)
    • Heading (alphanumeric string)
    • Size of Dimension (integer)
  • N-Table (Class)
    • Array of Dimension (Dimension Class Objects)
    • Array of Data (Alphanumeric strings)

You can get the number of dimensions by looking at:

my $numOfDimensions = scalar @{$ntable->{DIMENSIONS}};

And, you can get the heading of dimension $x by looking at:

my xDimensionHeading = $ntable->{DIMENSION}->[$x]->{HEADING};

And, the size of that dimension by looking at:

my xDimensionSize = $ntable->{DIMENSION}->[$x]->{SIZE};

Of course, you'd do this with true object oriented calls, and not bare references, but this gives you an idea how the structure would work.

Now, you need a way of transforming a list of integers that would represent a cell's location into a cell's location along a single dimensional array, and you'll have a way of getting and retrieving your data.

Would this be what you're looking for?


EDIT

Close to it, but I actually resize the table dimensions a lot (I can't determine their size in advance) and if I understood your solution doesn't accomodate for this.

This adds a lot of complication...

We need to throw out the Size in the Dimension class. And, we can't use a single dimensional array to store our data.

I hope you don't change the table dimensionality.

We could do something like this:

  • N-Table (Class)
    • List of Dimension Headings {DIMENSION}->[]
    • List to Data {DATA}->[] (This could be a link to other lists)

The {DATA} list is a link of lists depending on the depth of the table. For example:

 my data_3D = $table_3D->{DATA}->[$x]->[$y]->[$z];
 my data_2D = $table_2D->{DATA}->[$x]->[$y];

The number of dimensions is scalar @{$table->{DIMENSION}}.

The question is how do I access the data in a way that's dimensional neutral. I could require 2, 3, 4, or more dimensions, and I have to have someway of structuring my address to pull it out.

We could have some sort of looping mechanism. We get a list of coordinates in @coordinates, and then look at each coordinate. The last will point to data. The rest will simply be another reference to another array.

 my $data = pop @coordinates;    #First Coordinate
 $data = $table->[$data];        #Could be data if 1D table, could be a reference
 foreach my $coordinate (@coordinates) {
    die qq(Not enough coordinates) if ref $data ne 'ARRAY';
    $data = $data->[$coordinate];   #Could be data, could be a reference
 }

 # Cell value is in $data

It also may be possible to build a list of coordinates, and then evaluating it. Again completely untested:

 $coordinates = "[" . join ("]->[" => @coordinates . "]";

If there were three coordinates, this would be

 $coordinates = "[$x]->[$y]->[$z]";

I'm not sure how a 1 dimensional array would work...

From there, you could build a statement and use eval on it and get the data.

You'll have to have several methods.

  • Set the dimensions
  • Set a cell
  • Retrieve a cell
  • Verify table is completeness (I have no idea how this would work.

This is more a brain dump, but it I think this might work. You don't have any set table dimensions and it might work for any N-dimensional table.

Community
  • 1
  • 1
David W.
  • 105,218
  • 39
  • 216
  • 337
  • Close to it, but I actually resize the table dimensions a lot (I can't determine their size in advance) and if I understood your solution doesn't accomodate for this. – Giovanni Funchal Sep 28 '11 at 08:44
  • 1
    Table resizing adds a lot of complications. If you increase or decrease your table size, you'll have to restructure all of your data to include the change. I'll see what I can come up with... – David W. Sep 28 '11 at 17:41
  • Thanks for your time. I have a question about the join+eval solution, what if some dimensions combinations are undefined (i.e. sparse)? – Giovanni Funchal Sep 30 '11 at 11:33
  • Incomplete tables? Another complication. The join will produce the required coordinates, but the eval will fail which you can catch. The problem is that each coordinate is its own data structure. Thus, if you will have to have to build it one coordinate at a time, you'll have to check for each coordinate to see if it exists or not. You maybe better off with the loop anyway. – David W. Oct 02 '11 at 03:13
  • I ended up moving to Python instead. – Giovanni Funchal Mar 27 '12 at 18:40
1

You may be able to use Text::TabularDisplay to do this. Here's a quick trial I did with your example.

use strict;
use warnings;
use Text::TabularDisplay;

my $t = Text::TabularDisplay->new(('', 'column0', 'column1'));
$t->add('row0', 'data00', 'data10');
$t->add('row1', 'data01', 'data11');
print $t->render;

shows:

+------+---------+---------+
|      | column0 | column1 |
+------+---------+---------+
| row0 | data00  | data10  |
| row1 | data01  | data11  |
+------+---------+---------+

I'm not sure if this is exactly what you were looking for. You do have to fudge with the header by leaving the first column blank.

Joel
  • 3,435
  • 2
  • 23
  • 33
0

Text::Table might be useful here. I am showing a simple example below: You can play with the various options the module provides to create something close what you describe.

#!/usr/bin/perl

use warnings;
use strict;

use Text::Table;

my $inner_table = Text::Table->new(qw(column0 column1));

$inner_table->load(
    [ qw(row0 data00 data01) ],
    [ qw(row1 data10 data11) ],
);

my $outer_table = Text::Table->new(' ', 'dimension1');

$outer_table->load(
    ['dimension0', $inner_table->stringify ],
);

print $outer_table;

Output

C:\Temp> t
           dimension1
dimension0 column0 column1
           row0    data00
           row1    data10
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339