My program, written in Python 3, has many places where it starts with a (very large) table-like numeric data structure and adds columns to it following a certain algorithm. (The algorithm is different in every place.)
I am trying to convert this into pure functional approach since I run into problems with the imperative approach (hard to reuse, hard to memoize interim steps, hard to achieve "lazy" computation, bug-prone due to reliance on state, etc.).
The Table
class is implemented as a dictionary of dictionaries: the outer dictionary contains rows, indexed by row_id
; the inner contains values within a row, indexed by column_title
. The table's methods are very simple:
# return the value at the specified row_id, column_title
get_value(self, row_id, column_title)
# return the inner dictionary representing row given by row_id
get_row(self, row_id)
# add a column new_column_title, defined by func
# func signature must be: take a row and return a value
add_column(self, new_column_title, func)
Until now, I simply added columns to the original table, and each function took the whole table as an argument. As I'm moving to pure functions, I'll have to make all arguments immutable. So, the initial table becomes immutable. Any additional columns will be created as standalone columns and passed only to those functions that need them. A typical function would take the initial table, and a few columns that are already created, and return a new column.
The problem I run into is how to implement the standalone column (Column
)?
I could make each of them a dictionary, but it seems very expensive. Indeed, if I ever need to perform an operation on, say, 10 fields in each logical row, I'll need to do 10 dictionary lookups. And on top of that, each column will contain both the key and the value, doubling its size.
I could make Column
a simple list, and store in it a reference to the mapping from row_id to the array index. The benefit is that this mapping could be shared between all columns that correspond to the same initial table, and also once looked up once, it works for all columns. But does this create any other problems?
If I do this, can I go further, and actually store the mapping inside the initial table itself? And can I place references from the Column
objects back to the initial table from which they were created? It seems very different from how I imagined a functional approach to work, but I cannot see what problems it would cause, since everything is immutable.
In general does functional approach frown on keeping a reference in the return value to one of the arguments? It doesn't seem like it would break anything (like optimization or lazy evaluation), since the argument was already known anyway. But maybe I'm missing something.