Let's say I have a dataframe with three of its columns being
> df
A B C
1232 27.3 0.42
1232 27.3 0.36
1232 13.1 0.15
7564 13.1 0.09
7564 13.1 0.63
The required output is:
[1232] [7564]
[13.1] 0.15 0.36
[27.3] 0.39 0
I need to make a matrix with unique values in A and B as my rows and columns. The value for any cell in the matrix is to be calculated by subsetting the original dataframe for the particular value of A and B and calculating the mean of column C.
My code is:
mat <- matrix(rep(0), length(unique(df$A)), nrow = length(sort(unique(df$B))))
# sort is to avoid NA
colnames(mat) <- unique(df$A)
rownames(mat) <- unique(df$B)
for (row in rownames(mat)) {
for (col in colnames(mat)) {
x <- subset(df, A == col & B == row)
mat[row, col] = mean(df$C)
}
}
This is very slow, considering I have to deal with a matrix that has thousands of rows and columns. How can I make this run faster?