2

I have data structured in a table form

+------+------+------+------+
|      | Col1 | Col2 | Col3 |
+------+------+------+------+
| Row1 |    1 |    2 |    3 |
| Row2 |    5 |    5 |    6 |
| Row3 |    9 |    2 |    7 |
+------+------+------+------+

I'm looking for a data structure that allows the following:

  • Fast Iteration of column and rows (get the values for the column or the rows. (It should not be more expensive to iterate in one direction than in the other)
  • Fast adding and removing of entire rows and columns (again both operations should be equally fast and should be at most O(n) )
  • Order based on insertion order and reorderable. Order would be calculated with some comparators and is usually depended on the data in the row or column, but not on any names or such
  • Store data other than numbers (We have mixed data, but I plan to use a container class for the actual data anyways)

Additionally rows and columns will have metadata (name, color and stuff like that). All these operations happen often in our system. Currently we store the data row based and the columns have no reference to the data related to them. This makes deleting a column or iterating over its data very tedious.

The first thing that sprang to my mind is the Guava Table but that is not ordered and I'm not sure if it is easy to delete an entire row or column, although clearing the row or column map might do it.

Arrays as backing storage will not work because of adding and removing required. (Although I could predict how large the table will be and create new tables for the removing, but I don't like that solution, even if it could be hidden from the user)

I would be grateful for any ideas on how to implement such a data structure.

To clarify, I don't need a finished library that does this, but I'm looking for a data structure that would allow me do create this. I already know that I will be storing row and column meta-data in separate lists for example

dimo414
  • 47,227
  • 18
  • 148
  • 244
Chris
  • 390
  • 2
  • 6
  • Please clarify what is "ordering" in your case? – Alex Salauyou May 12 '15 at 14:01
  • Another question: do you need O(1) access by index or O(k) is okay (k - number of columns/rows)? – Alex Salauyou May 12 '15 at 14:09
  • For access by index O(k) is okay, we don't do that that often (well, technically we do because we can't iterate over columns, but that is what I would like to change). Ordering would be on insertion order usually, although in the end we normally order by row or column sum (row with the most results to the top or column to the left) – Chris May 12 '15 at 14:19
  • if so, I suggest a structure based on 4-linked nodes (upper, lower, left, right), which can be easily iterated in any direction and addition/removal to which also won't exceed O(k). I started to write a code as an answer, but understood that some cases need testing and debugging, so discarded. I suppose I'll return back later. If you find some ready solution, please comment here. – Alex Salauyou May 12 '15 at 14:33
  • 4-linked list already crossed my mind, but I put it on the backburner because of the complexity in implementing it. You don't need to spend time writing code for it however. – Chris May 12 '15 at 14:39
  • Okay :) the only difficulty I faced was a case when a size of a row (column) to be added is different from table dimension – Alex Salauyou May 12 '15 at 14:41

2 Answers2

3

Guava Tables can certainly be ordered. ImmutableTable provides "reliable user-specified iteration order" and (from the Builder docs) "by default, the order in which cells are added to the builder determines the iteration ordering of all views in the returned table".

Or TreeBasedTable can be used if you need mutable a Table and want a RowSortedTable.

You could also implement your own LinkedHashBasedTable class pretty easily. Either with ForwardingTable and a LinkedHashSet to store your desired iteration order, or simply call Tables.newCustomTable() with a LinkedHashMap as the first argument (and the result of the Supplier, if needed).


Most Table implementations provide O(1) or O(n) (where n is the row/colum count) implementations for all the standard methods, and the methods that return views (like row() and colum()) are directly backed by the original Table, so they're similarly efficient.

If you're really concerned that all the available Table implementations (including .newCustomTable()) are too slow, you should benchmark this. They're more than efficient enough for all normal uses, and without proof that Tables are your bottleneck creating your own data structure is a clear instance of premature optimization.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Read javadoc, but still have no idea how to implement fast column deletion with this... – Alex Salauyou May 12 '15 at 14:52
  • @SashaSalauyou the cleanest way is probably with `Table.column(column).clear()` - this is O(n) (where n is the row count) since it has to iterate over the row map. – dimo414 May 12 '15 at 15:15
  • I suppose this is for clearing (setting nulls or dummy values to entire column), not removing. After many insertions/deletions such table will be sparse. (I'm not going to criticize, just discovering possibilities) – Alex Salauyou May 12 '15 at 15:15
  • 1
    @SashaSalauyou "After many insertions/deletions such table will be very sparse." I'm not entirely sure what you're getting at, but the `Table` implementations other than `ArrayTable` are designed specifically for sparse data. – dimo414 May 12 '15 at 15:19
  • I'm talking about array-based table. I mean that after removing column then adding a new one 100 times to 10-column table, it will physically occupy space as 110-column, since structural data are not physically removed, just cleared values. – Alex Salauyou May 12 '15 at 15:35
  • 1
    @SashaSalauyou `ArrayTable` does not support row/column insertion and removal at all, it is always the same size. If you need to insert and remove rows or columns, you should not use `ArrayTable`. – dimo414 May 12 '15 at 16:52
  • Perhaps you are misunderstanding what [`.clear()`](http://docs.oracle.com/javase/8/docs/api/java/util/Map.html#clear%28%29) will do in my comment above. This will *remove* the values associated with that column, not set them to null. – dimo414 May 12 '15 at 16:56
  • I see. Maybe when I have free time I'll implement a table based on 4-linked nodes to compare performance of inserting/deleting entire columns and rows and accessing them by index, not keys. Thanks for explaining Guava tables! – Alex Salauyou May 12 '15 at 19:42
  • @SashaSalauyou It'd certainly be interesting to see, though I'm honestly skeptical it will be faster than the existing implementations. – dimo414 May 12 '15 at 20:29
0

Thank you all for your help and ideas. I've come to the conclusion that I will keep the current structure of the table consisting of a list of rows who each have a list of cells. While this makes it difficult to iterate over columns or deleting columns, our tables are usually quite small ( < 15 rows, <100 columns) and the main bottleneck of our current implementation is that the data is stored in strings instead of being typed (Double.parse can be quite expensive if you do it too often). While it would be very interesting to develop a data-structure that fulfils all of the requirements in my original question, the reality is that I can't spend that much time on working on this.

Chris
  • 390
  • 2
  • 6
  • Did you try using a `.newCustomTable()` with `LinkedHashMap`s? That should be much faster than a list based approach. – dimo414 May 13 '15 at 13:40