I'm debuting on R, and I've got to analyse a big database. Because of the number of different functions I'll have to do with the same rows, I thought using loops. So I made a 3-column matrix with, on the first column, the name I'll want to give to the output, on the second the table I'll use and on the third the column I'll compare it with. It looks like something like that:
my_data <- read_excel(.../my_data.xlxs, sheet=sheet1, coltypes=c("date". "text", ..., "text"))
thingstoanalyse1 <- my_data[ which(my_data$columnA=='thistext'), ]
result_tta1 <- thingstoanalyse1[c("columnA", "columnB", "columnC", ..., "columnZ")]
and a lots of other tables extracted from my_data. Then I created this matrix:
datamatrix <- matrix(data=c("name1", "name2", ..., "nameX", "result_tta1",
"result_tta2", ..., "result_ttaX","thingstoanalyse1$columnA",
"thingstoanalyse2$columnB", ...
"thingstoanalyseX$columnC"),
nrow=alot, ncol=3)
It's simplicated for the example, there is a lot of cross things (maybe a row with result_tta5 with thinkstoanalyse2, there is not always the same determinant).
My goal was the to use a loop, like for instance something like
lapply(datamatrix, function(k) k_tbl = lapply(l) function(x) table(m, x))
where k would be the first column of my matrix (to name the output), l the one on the second column but on the same row as k (to say in which table it has to check) and m the third column but on the same row than k and l (to say which ). (I simplified the inner of the function I'd like to do with just a "table" to help to visualize the thing).
If I was able to find how to code the loop to say how to take the good values for that, then I would be able to use just one formula to code all the tables, then with a copy paste and just changing the formula inside do the same to obtain my plots, then to obtain statistical tests results etc...
But I've sadly not found a way to do that by myself. I could put the k, l and m in a table and not in a matrix if it could help, or list them in other way, but I'm just lost yet and couldn't find something helpful or that I could understand on the web.
Example:
my_data <- data.frame(A = sample(letters[1:4], 20, replace = TRUE),
B = sample(letters[1:4], 20, replace = TRUE),
C = sample(letters[1:4], 20, replace = TRUE),
D = sample(letters[1:4], 20, replace = TRUE),
R1 = sample(letters[6:8], 20, replace = TRUE),
R2 = sample(letters[6:8], 20, replace = TRUE))
thingstoanalyse1 <- my_data[which(my_data$A=='a'), ]
thingstoanalyse2 <- my_data[which(my_data$C=='b'), ]
thingstoanalyse3 <- my_data[which(my_data$A=='b'), ]
result_tta1 <- thingstoanalyse1[c("R1", "R2")]
result_tta2 <- thingstoanalyse2[c("R1", "R2")]
result_tta3 <- thingstoanalyse3[c("R1", "R2")]
datamatrix <- matrix(data=c("result1", "result2", "result3", "result4",
"result5", "result_tta1", "result_tta2",
"result_tta3", "result_tta3", result_tta3",
"thingstoanalyse1$A", "thingstoanalyse$C",
"thingstoanalyse3$A", "thingstoanalyse3$B",
"thingstoanalyse3$C"),
nrow=5, ncol=3)
My goal would be to obtain such equations:
lapply(result_tta1, function(x)
cbind(prop.table(table(thingstoanalyse1$A, x), 1),
margin.table(table(thingstoanalyse1$A, x), 1),
margin.table(table(thingstoanalyse1$A, x)))) ->
result1_tbl
lapply(result_tta2, function(x)
cbind(prop.table(table(thingstoanalyse2$C, x), 1),
margin.table(table(thingstoanalyse2$C, x),1),
margin.table(table(thingstoanalyse2$C, x)))) ->
result2_tbl
lapply(result_tta3, function(x)
cbind(prop.table(table(thingstoanalyse3$A, x), 1),
margin.table(table(thingstoanalyse3$A, x), 1),
margin.table(table(thingstoanalyse3$A, x)))) ->
result3_tbl
lapply(result_tta3, function(x)
cbind(prop.table(table(thingstoanalyse3$B, x), 1),
margin.table(table(thingstoanalyse3$B, x), 1),
margin.table(table(thingstoanalyse3$B, x)))) ->
result4_tbl
lapply(result_tta3, function(x)
cbind(prop.table(table(thingstoanalyse3$C, x), 1),
margin.table(table(thingstoanalyse3$C, x), 1),
margin.table(table(thingstoanalyse3$C, x)))) ->
result5_tbl
But because it is very time consuming and because I'll have to make other things with this same tables in this same order, like for instance that:
pdf("result4.pdf")
lapply(result_tta3, function(x)
barplot(table(x, thingstoanalyse3$B),
main="name", xlab=("lab"),
col=c("green", "yellow", "red"),
legend=rownames(table(x, thingstoanalyse3$B)),
beside=TRUE))
dev.off()
and a lot of other things. So I would like to find a function which says something like
%let's say the first column in the matrix is k
, the second l
and the third m
lapply(datamatrix function(k, l, m)
k <- lapply(l, function(x)
cbind(prop.table(table(m, x), 1),
margin.table(table(m, x), 1),
margin.table(table(m, x)))))
or something else, but which makes a loop and replace always the "resultx", "result_ttax" ans "thingstoanalysex" with the values in the corresponding column in the matrix, but always in the same row.