I am aware that a little programming allows converting fixed-dimension frequency tables, as returned e.g. by table()
, back into observation data. So the aim is to convert a frequency table such as this one...
(flower.freqs <- with(iris,table(Petal=cut(Petal.Width,2),Species)))
Species
Petal setosa versicolor virginica
(0.0976,1.3] 50 28 0
(1.3,2.5] 0 22 50
...back into a data.frame()
with a row number that corresponds to the sum of the numbers of the input matrix, while the cell values are obtained from input dimensions:
Petal Species
1 (0.0976,1.3] setosa
2 (0.0976,1.3] setosa
3 (0.0976,1.3] setosa
# ... (150 rows) ...
With some tinkering I build a rough prototype that should also digest higher-dimensional inputs:
tableinv <- untable <- function(x) {
stopifnot(is.table(x))
obs <- as.data.frame(x)[rep(1:prod(dim(x)),c(x)),-length(dim(x))-1]
rownames(obs) <- NULL; obs
}
> head(tableinv(flower.freqs)); dim(tableinv(flower.freqs))
Petal Species
1 (0.0976,1.3] setosa
2 (0.0976,1.3] setosa
3 (0.0976,1.3] setosa
4 (0.0976,1.3] setosa
5 (0.0976,1.3] setosa
6 (0.0976,1.3] setosa
[1] 150 2
> head(tableinv(Titanic)); nrow(tableinv(Titanic))==sum(Titanic)
Class Sex Age Survived
1 3rd Male Child No
2 3rd Male Child No
3 3rd Male Child No
4 3rd Male Child No
5 3rd Male Child No
6 3rd Male Child No
[1] TRUE
I am obviously proud that this bricolage reconstructs multi-attribute data.frame()
s from higher-dimensional frequency tables such as Titanic
- but is there an established (built-in, battle-tested) general inverse to table(), ideally one that does not depend on a specific library, that knows how to handle unlabeled dimensions, that is optimized so that it will not choke on bulky inputs, and that reasonably deals with table inputs that would correspond to factor as well as non-factor observation inputs?