I have a data source which can be represented as a list of maybe 30 dataframes (I'm saying dataframes, but if a better structure exists feel free to answer using that - nothing has been written yet). Each dataframe doesn't have any direct relationship to the other, but each has 3 columns and n rows. Column 1 is labels, column 2 and 3 are values (always numeric).
For tangibility, imagine each dataframe is a list of foods eaten at a party, and each column represents the number of each item Alice and Bob ate.
For example
A = [5 x 3] # (apples, pears, cookies, grapes, watermelon)
# ---------------------------------------------------
# item Alice Bob
# apples 3 7
# pears 1 2
# cookies 10 4
# grapes 238 483
# watermelon 0 1
# ---------------------------------------------------
B = [1 x 3] # (grapes)
C = [3 x 3] # (beef, rice, apples)
...
Z = [4 x 3] # (rice, grapes, watermelon, beef)
I want to represent these matrices as a data structure, such that I can ask
- General questions above all items - e.g. which of Alice and Bob ate most items overall, what was their average, standard deviation, etc
- Label specific questions - e.g. which of Alice and Bob at most grapes?
Whenever I have this kind of problem I always end up writing really ugly code which has lists of lists, requires the []
operator or as.matrix()
/as.list()
/as.dataframe()
functions, and generally seems like a really crappy way of doing things.
What would be a goood/the best approach for these kind of data?